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SYNTHETTC PLANT GENES AND METHOD FOR PREPARATION 

BACKGROUND OF THE INVENTION 

5 

The present invention relates to genetic 
engineering and more particularly to plant 
transformation in which a plant is transformed to 
express a heterologous gene. 

10 Although great progress has been made in recent 

years with respect to transgenic plants which express 
foreign proteins such as herbicide resistant enzymes 
and viral coat proteins, very little is known about 
the major factors affecting expression of foreign 
15 genes in plants. Several potential factors could be 
responsible in varying degrees for the level of 
protein expression from a particular coding sequence.. 
The level of a particular mRNA in the cell is 
certainly a critical factor. 

The potential causes of low steady state levels o.f 
Aj 

mRNA due to the nature of the coding sequence are 

many. First, full length RNA synthesis might not 

occur at a high frequency. This could, for example, 

be caused by the premature termination of RNA during 

transcription or due to unexpected mRNA processing 
05 

during transcription. Second, full length RNA could 
be produced but then processed (splicing, polyA 
addition) in the nucleus in a fashion that creates a 
nonfunctional mRNA. If the RNA is properly 
synthesized, terminated and polyadenylated, it then 
30 can move to the cytoplasm for translation. In the 
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cytoplasm, mRNAs have distinct half lives that are 
determined by their sequences and by the cell type in 
which they are expressed. Some RNAs are very short- 
® lived and some are much more long-lived. In addtion, 
there is an effect, whose magnitude is uncertain, of 
translational efficiency on mRNA half-life. In 
addition, every RNA molecule folds into a particular 
structure, or perhaps family of sturctures, which is 
10 determined by its sequence. The particular structure 
of any RNA might lead to greater or lesser stability 
in the cytoplasm. Structure per se is probably also a 
determinant of mRNA processing in the nucleus. 
Unfortunately, it is impossible to predict, and nearly 
impossible to determine, the structure of any RNA 
(except for tRNA) in vitro or in vivo. However, it is 
likely that dramatically changing the sequence of an 
RNA will have a large effect on its folded structure. 
It is likely that structure per se or particular 
structural features also have a role in determining 
^ RNA stability. 

Some particular sequences and signals have been 
identified in RNAs that have the potential for having 
a specific effect on RNA stability. This section 
summarizes what is known about these sequences and 
25 signals. These identified sequences often are A+T 
rich, and thus are more likely to occur in an A+T rich 
coding sequence such as a B.t. gene. The sequence 
motif ATTTA (or AUUUA as it appears in RNA) has been 
implicated as a destabilizing sequence in mammalian 
30 cell mRNA (Shaw and Kamen, 1986) . No analysis of the 
function of this sequence in plants has been done. 
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Many short lived mRNAs have A+T rich 3' untranslated 

regions, and these regions often have the ATTTA 

sequence, sometimes present in mutiple copies or as 

® multimers (e.g., ATTTATTTA...) . Shaw and Kamen showed 

that the transfer of the 3' end of an unstable mKNA to 

a stable RNA (globin or VA1) decreased the stable 

RNA's half life dramatically. They further showed 

that a pentamer of ATTTA had a profound destabilizing 

10 effect on a stable message, and that this signal could 

exert its effect whether it was located at the 3' end 

or within the coding sequence. However, the number of 

ATTTA sequences and/or the sequence context in which 

they occur also appear to be important in determining 

jg whether they function as destabilizing sequences. 

Shaw and Kamen showed that a trimer of ATTTA had much 

less effect than a pentamer on mRNA stability and a 

dimer or a monomer had no effect on stability (Shaw 

and Kamen, 1987) . Note that multimers of ATTTA such 

as a pentamer automatically create an A+T rich region. 
Ju 

This was shown to be a cytoplasmic effect, not 

nuclear. In other unstable mRNAs, the ATTTA sequence 

may be present in only a single copy, but it is often 

contained in an A+T rich region. From the animal cell 

data collected to date, it appears that ATTTA at least 
25 

in some contexts is important in stability, but it is 
not yet possible to predict which occurences of ATTTA 
are destabiling elements or whether any of these 
effects are likely to be seen in plants. 

Some studies on mRNA degradation in animal cells 
30 also indicate that RNA degradation may begin in some 
cases with nucleolytic attack in A+T rich regions. It 
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is not clear if these cleavages occur at ATTTA 
sequences. There are also examples of mRNAs that have 
differential stability depending on the cell type in 
which they are expressed or on the stage within the 
cell cycle at which they are expressed. For example, 
histone mRNAs are stable during DNA synthesis but 
unstable if DNA synthesis is disrupted. The 3' end of 
some histone mRNAs seems to be responsible for this 
effect (Pandey and Marzluff, 1987) . It does not 
appear to be mediated by ATTTA, nor is it clear what 
controls the differential stability of this mRNA. 
Another example is the differential stability of IgG 
mRNA in B lymphocytes during B cell maturation 
(Genovese and Milcarek, 1988). A final example is the 
instability of a mutant beta-thallesemic globin mRNA. 
In bone marrow cells, where this gene is normally 
expressed, the mutant mRNA is unstable, while the wild- 
type mRNA is stable. When the mutant gene is 

expressed in HeLa or L cells in vitro, the mutant mRNA 
shows no instability (Lim et al., 1988). These 
examples all provide evidence that mRNA stability can 
be mediated by cell type or cell cycle specific 
factors. Furthermore this type of instability is not 
yet associated with specific sequences. Given these 
uncertainties, it is not possible to predict which 
RNAs are likely to be unstable in a given cell. In 
addition, even the ATTTA motif may act differentially 
depending on the nature of the cell in which the RNA 
is present. Shaw and Kamen .(1987) have reported that 
activation of protein kinase C can block degradation 
mediated by ATTTA. 


30 
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The addition of a polyadenylate string to the 3 ' 

end is common to most eucaryotic mRNAs, both plant and 

animal. The currently accepted view of polyA addition 

® is that the nascent transcript extends beyond the 

mature 3' terminus. Contained within this transcript 

are signals for polyadenylation and proper 3' end 

formation. This processing at the 3' end involves 

cleavage of the mRNA and addition of polyA to the 

30 mature 3' end. By searching for consensus sequences 

near the polyA tract in both plant and animal mRNAs, 

it has been possible to identify consensus sequences 

that apparently are involved in polyA addition and 3’ 

end cleavage. The same consensus sequences seem to be 

25 important to both of these processes. These signals 

are typically a variation on the sequence AATAAA. In 

animal cells, some variants of this sequence that are 

functional have been identified; in plant cells there 

seems to be an extended range of functional sequences 

(Wickens and Stephenson, 1984; Dean et al., 1986). 
a) 

Because all of these consensus sequences are 

variations on AATAAA, they all are A+T rich sequences. 

This sequence is typically found 15 to 20 bp before 

the polyA tract in a mature mRNA. Experiments in 

animal cells indicate that this sequence is involved 
25 

in both polyA addition and 3' maturation. Site 
directed mutations in this sequence can disrupt these 
functions (Conway and Wickens, 1988; Wickens et al., 
1987) . However, it has also been observed that 
sequences up to 50 to 100 bp 3' to the putative polyA 
signal are also required; i.e., a gene that has d 
normal AATAAA but has been replaced or disrupted 
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downstreara does not get properly polyadenylated (Gil 
and Proudfoot, 1984; Sadofsky and Alwine, 1984; 
McDevitt et al., 1984). That is, the polyA signal 
** itself is not sufficient for complete and proper 
processing. It is not yet known what specific 
downstream sequences are required in addition to the 
polyA signal, or if there is a specific sequence that 
has this function. Therefore, sequence analysis can 
30 only identify potential polyA signals. 

In naturally occuring mRNAs that are normally 
polyadenylated, it has been observed that disruption 
of this process, either by altering the polyA signal 
or other sequences in the mRNA, profound effects can 
15 be obtained in the level of functional mRNA. This has 
been observed in several naturally occuring mRNAs, 
with results that are gene specific so far. There are 
no general rules that can be derived yet from the 
study of mutants of these natural genes, and no rules 
2 q that can be applied to heterologous genes. Below are 
four examples: 

1. In a globin gene, absence of a proper polyA 
site leads to improper termination of transcription. 
It is likely, but not proven, that the improperly 
terminated RNA is nonfunctional and unstable 

^ (Proudfoot et al., 1987). 

2. In a globin gene, absence of a functional 
polyA signal can lead to a 100-fold decrease in the 
level of mRNA accumulation (Proudfoot et al., 1987). 

3. A globin gene polyA site was placed into the 
3' ends of two different histone genes. The histone 
genes contain a secondary structure (stem-loop) near 


30 
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their 3' ends. The amount of properly polyadenylated 

histone mRNA produced from these chimeras decreased as 

the distance between the stem-loop and the polyA site 
c 

increased. Also, the two histone genes produced 
greatly different levels of properly polyadenylated 
mRNA. This suggests an interaction between the polyA 
site and other sequences on the mRNA that can modulate 
mRNA accumulation (Pandy and Marzluff, 1987). 

10 4. The soybean leghemoglobin gene has been cloned 

into HeLa cells, and it has been determined that this 
plant gene contains a "cryptic" polyadenylation signal 
that is active in animal cells, but is not utilized in 
plant cells. This leads to the production of a new 
15 polyadenylated mRNA that is nonfunctional. This again 
shows that analysis of a gene in one cell type cannot 
predict its behavior in alternative cell types 
(Wiebauer et al., 1988). 

From these examples, it is clear that in natural 
mRNAs proper polyadenylation is important in mRNA 

20 

accumulation, and that disruption of this process can 

effect mRNA levels significantly. However, 

insufficient knowledge exists to predict the effect of 

changes in a normal gene. In a heterologous gene, 

where we do not know if the putative polyA sites 
25 

(consensus sequences) are functional, it is even 
harder to predict the consequences. However, it is 
possible that the putative sites identified are 
disfunctional. That is, these sites may not act as 
proper polyA sites, but instead function as aberrant 
30 sites that give rise to unstable mRNAs. 
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In animal cell systems, AATAAA is by far the most 

common signal identified in mRNAs upstream of the 

polyA, but at least four variants have also been found 
5 

(Wickens and Stephenson, 1984). In plants, not nearly 
so much analysis has been done, but it is clear that 
multiple sequences similar to AATAAA can be used. The 
plant sites below called major or minor refer only to 
the study of Dean et al. (1986) which analyzed only 
10 three types of plant gene. The designation of 
polyadenylation sites as major or minor refers only to 
the frequency of their occurrence as functional sites 
in naturally occurring genes that have been analyzed. 
In the case of plants this is a very limited database. 
15 It is hard to predict with any certainty that a site 
designated major or minor is more or less likely to 
function partially or completely when found in a 
heterologous gene such as B.t. 


PA 

AATAAA 

Major consensus site 

P1A 

AATAAT 

Major plant 

site 

P2A 

AACCAA 

Minor plant 

site 

P3A 

ATATAA 

n 


P4A 

AATCAA 

fl 


P5A 

ATACTA 

tl 


P6A 

ATAAAA 

n 


P7A 

ATGAAA 

n 


P8A 

AAGCAT 

H 


P9A 

ATTAAT 

it 


P10A 

ATACAT 

it 


P11A 

AAAATA 

it 
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P12A 

ATTAAA 

Minor animal site 

P13A 

AATTAA 

ir 

P14A 

AATACA 

ff 

P15A 

CATAAA 

n 


Another type of RNA processing that occurs in the 
nucleus is intron splicing. Nearly all of the work on 
intron processing has been done in animal cells, but 
10 some data is emerging from plants. Intron processing 
depends on proper 5' and 3' splice junction sequences. 
Consensus sequences for these junctions have been 
derived for both animal and plant mRNAs, but only a 
few nucleotides are known to be invariant. Therefore, 
15 it is hard to predict with any certainty whether a 
putative splice junction is functional or partially 
functional based solely on sequence analysis. In 
particular, the only invariant nucleotides are GT at 
the 5' end of the intron and AG at the 3' end of the 
intron. In plants, at every nearby position, either 
within the intron or in the exon flanking the intron, 
all four nucleotides can be found, although some 
positions show some nucleotide preference (Brown, 
1986; Hanley and Schuler, 1988). 

A plant intron has been moved from a patatin gene 

25 

into a GUS gene. To do this, site directed 

mutagenesis was performed to introduce new restriction; 
sites, and this mutagenesis changed several 
nucleotides in the intron and exon sequences flanking 
the GT and AG. This intron still functioned properly, 
indicating the importance of the GT and AG and the 
flexibility at other nucleotide positons. There are 
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of course many occurences of GT and AG in all genes 

that do not function as intron splice junctions, so 

there must be some other sequence or structrual 

5 

features that identify splice junctions. In plants, 
one such feature appears to be base composition per 
se. Wiebauer et al. (1988) and Goodall et al. (1988) 
have analyzed plant introns and exons and found that 
exons have -50% A+T while introns have -70% A+T. 
10 Goodall et al. (1988) also created an artificial plant 
intron that has consensus 5' and 3' splice junctions 
and a random A+T rich internal sequence. This intron 
was spliced correctly in plants. When the internal 
segment was replaced by a G+C rich sequence, splicing 
35 efficiency was drastically reduced. These two 
examples demonsatrate that intron recognition in 
plants may depend on very general features — splice 
junctions that have a great deal of sequence diversity 
and A+T richness of the intron itself. This, of 
course, makes it difficult to predict from sequence 
alone whether any particular sequence is likely to 
function as an active or partially active intron for 
RNA processing. 

B .t . genes being A+T rich contain numerous 

stretches of various lengths that have 70% or greater 
25 

A+T. The number of such stretches identified by 
sequence analysis depends on the length of sequence 
scanned. 

As for polyadenylation described above, there are 
complications in predicting what sequences might be 
utilized as splice sites in any given gene. First, 
many naturally occuring genes have alternative 
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splicing pathways that create alternative combinations 
of exons in the final mRNA (Gallega and Nadal-Ginard, 
1988; Helfman and Ricci, 1988; Tsurushita and Korn, 
1989) . That is, some splice junctions are apparently 
recognized under some circumstances or in certain cell 
types, but not in others. The rules governing this 
are not understood. In addition, there can be an 
interaction between processing paths such that 
10 utilization of a particular polyadenylation site can 
interfere with splicing at a nearby splice site and 
vice versa (Adami and Nevins, 1988; Brady and Wold, 
1988; Marzluff and Pandey, 1988). Again no predictive 
rules are available. Also, sequence changes in a gene 
15 can drastically alter the utilization of particular 
splice junctions. For example, in a bovine growth 
hormone gene, small deletions in an exon a few hundred 
bases downstream of an intron cause the splicing 
efficiency of the intron to drop from greater than 95% 
to less than 2% (essentially nonfunctional) . Other 
deletions however have essentially no effect (Hampson 
and Rottman, 1988) . Finally, a variety of in vitro 
and in vivo experiments indicate that mutations that 
disrupt normal splicing lead to rapid degradation of 
the RNA in the nucleus. Splicing is a multistep 
process in the nucleus and mutations in normal 
splicing can lead to blockades in the process at a. 
variety of steps. Any of these blockades can then 
lead to an abnormal and unstable RNA. Studies of 

mutants of normally processed (polyadenylation and 
30 splicing) genes are relevant to the study of 
heterologous genes such as B.t. B.t. genes might 
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contain functional signals that lead to the production 
of aberrant nonfunctional mRNAs, and these mRNAs are 
likely to be unstable. But the B.t. genes are perhaps 
even more likely to contain signals that are analogous 
to mutant signals in a natural gene. As shown above 
these mutant signals are very likely to cause defects 
in the processing pathways whose consequence is to 
produce unstable mRNAs. 

10 It is not known with any certainty what signals RNA 

transcription termination in plant or animal cells. 
Some studies on animal genes that indicate that 
stretches of sequence rich in T cause termination by 
calf thymus RNA polymerase II in vitro. These studies 
15 have shown that the 3* ends of in vitro terminated 
transcripts often lie within runs of T such as T5, T6 
or T7. Other identified sites have not been composed 
solely of T, but have had one or more other 
nucleotides as well. Termination has been found to 
2 q occur within the sequences TATTTTTT, ATTCTC, TTCTT 
(Dedrick et al., 1987; Reines et al., 1987). in the 
case of these latter two, the context in which the 
sequence is found has been C+T rich as well. It is 
not known if this is essential. Other studies have 
implicated stretches of A as potential transcriptional 
terminators. An interesting example from SV40 
illustrates the uncertainty in defining terminators 
based on sequence alone. One potential, terminator in 
SV40 was identified as being A rich and having a 
region of dyad symmetry (potential stem-loop) 5' to 
the A rich stretch. However, a second terminator 
identified experimentally downstream in the same gene 
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was not A rich and included no potential secondary 

structure (Kessler et al., 1988). Of course, due to 

the A+T content of B.t. genes, they are rich in runs 

** of A or T that could act as terminators. The 

importance of termination to stability of the mRNA is 

shown by the globin gene example described above. 

Absence of a normal polyA site leads to a failure in 

proper termination with a consequent decrease in mRNA. 

10 There is also an effect on mRNA stability due the 

translation of the mRNA. Premature translational 

termination in human triose phosphate isomerase leads 

to instability of the mRNA (Daar et al., 1988). 

Another example is the beta-thallesemic globin mRNA 

15 described above that is specifically unstable in bone 

marrow cells (Lim et al., 1988). The defect in this 

mutant gene is a single base pair deletion at codon 44 

that leads to translational termination (a nonsense 

codon) at codon 60. Compared to properly translated 

normal globin mRNA, this mutant RNA is very unstable. 

These results indicate that an improperly translated 

mRNA is unstable. Other work in yeast indicates that 

proper but poor translation can have an effect on mRNA 

levels. A heterologous gene was modified to convert 

certain codons to more yeast preferred codons. An 
25 

overall 10-fold increase in protein production was 
achieved, but there was also about a 3-fold increase 
in mRNA Boekema et al., 1987). This indicates that 
more efficient translation can lead to greater mRNA 
stability, and that the effect of codon usage can be 
30 at the RNA level as well as the translational level. 
It is not clear from codon usage studies which codons 
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lead to poor translation, or how this is coupled to 
mRNA stability. 

Therefore, it is an object of the present invention 

5 

to provide a method for preparing synthetic plant 
genes which express their respective proteins at 
relatively high levels when compared to wild-type 
genes. It is yet another object of the present 
invention to provide synthetic plant genes which 
10 express the crystal protein toxin of Bacillus 
thuringiensis at relatively high levels. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 Figure 1 illustrates the steps employed in 

modifying a wild-type gene to increase expression 
efficiency in plants. 

Figure 2 illustrates a comparison of the changes in 
the modified B.t.k. HD-1 sequence of Example 1 (lower 
line) versus the wild-type sequence of B.t.k. HD-1 

A) 

which encodes the crystal protein toxin (upper line). 

Figure 3 illustrates a comparison of the changes in 

the synthetic B.t.k. HD-1 sequence of Example 2 (lower 

line) versus the wild-type sequence of B.t.k. HD-1 

which encodes the crystal protein toxin (upper line). 

25 

Figure 4 illustrates a comparison of the changes in 
the synthetic B.t.k. HD-73 sequence of Example 3 
(lower line) versus the wild-type sequence of B.t.k. 
HD-73 (upper line). 

Figure 5 represents a plasmid map of intermediate 
30 plant transformation vector cassette pMON893. 
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Figure 6 represents a plasmid map of intermediate 
plant transformation vector cassette pMON900. 

Figure 7 represents a map for the disarmed T-DNA of 
A. tumefaclens ACO. 

Figure 8 illustrates a comparison of the changes in 
the synthetic truncated B.t.k. HD-73 gene (Amino acids 
29-615 with an N-terminal Met-Ala) of Example 3 (lower 
line) versus the wild-type sequence of B.t.k. HD-73 
(upper line). 

Figure 9 illustrates a comparison of the changes in 
the synthetic/wild-type full length B.t.k. HD-73 
sequence of Example 3 (lower line) versus the wild- 
type full-length sequence of B.t.k. HD-73 (upper 
line) . 

Figure 10 illustrates a comparison of the changes 
in the synthetic/modified full length B.t.k. HD-73 
sequence of Example 3 (lower line) versus the wild- 
type full-length sequence of B.t.k. HD-73 (upper 
line) . 

Figure 11 illustrates a comparison of the changes 
in the fully synthetic full-length B.t.k. HD-73 
sequence of Example 3 (lower line) versus the wild- 
type full-length sequence of B.t.k. HD-73 (upper 
line) . 

Figure 12 illustrates a comparison of the changes 
in the synthetic B.t.t. sequence of Example 5 (lower 
line) versus the wild-type sequence of B.t.t. which 
encodes the crystal protein toxin (upper line). 

Figure 13 illustrates a comparison of the changes 
in the synthetic B.t. P2 sequence of Example 6 (lower 
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line) versus the wild-type sequence of B.t.k. HD-1 
which encodes the P2 protein toxin (upper line) . 

Figure 14 illustrates a comparison of the changes 
in the synthetic B.t. entomocldus sequence of 
Example 7 (lower line) versus the wild-type sequence 
of B.t. entomocidus which encodes the Btent protein 
toxin (upper line). 

Figure 15 illustrates a plasmid map for plant 
10 expression cassette vector pMON744. 

Figure 16 illustrates a comparison of the changes 
in the synthetic potato leaf roll virus (PLRV) coat 
protein sequence of Example 9 (lower line) versus the 
wild-type coat protein sequence of PLRV (upper line). 
15 

STATEMENT OF THE INVENTION 

The present invention provides a method for 
preparing synthetic plant genes which genes express 
2 q their protein product at levels significantly higher 
than the wild-type genes which were commonly employed 
in plant transformation heretofore. In another 
aspect, the present invention also provides novel 
synthetic plant genes which encode non-plant proteins. 

For brevity and clarity of description, the present 
invention will be primarily described with respect to 
the preparation of synthetic plant genes which encode 
the crystal protein toxin of Bacillus thuringiensis 
(B.t.). Suitable B.t. subspecies include, but are not 
limited to, B.t. kurstaki HD-1, B.t. kurstaki HD-73, 
B.t. sotto, B.t. berliner, B.t. thuringiensis, B.t. 
tolworthi, B.t. dendrolimus, B.t. alesti, B.t. 
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galleriae, B.t. aizawai, B.t. subtoxicus, B.t. 
entomocldus, B.t. tenebrionis and B.t. san diego. 
However, those skilled In the art will recognize and 
® it should be understood that the present method may be 
used to prepare synthetic plant genes which encode non¬ 
plant proteins other than the crystal protein toxin of 
B.t. as well as plant proteins (see for instance. 
Example 9) . 

10 The expression of B.t. genes in plants is 
problematic. Although the expression of B.t. genes in 
plants at insecticidal levels has been reported, this 
accomplishment has not been straightforward. In 
particular, the expression of a full-length 
15 lepidopteran specific B.t. gene (comprising DNA from a 
B.t.k. isolate) has been reported to be unsuccessful 
in yielding insecticidal levels of expression iii some 
plant species (Vaeck et al., 1987 and Barton et ai., 
1987) . 

It has been reported that expression of the full- 
4) 

length gene from B.t.k. HD-1 was detectable in tomato 

plants but that truncated genes led to a higher 

frequency of insecticidal plants with an overall 

higher level of expression. Truncated genes of B.t. 

ber liner also led to a higher frequency of 
25 

insecticidal plants in tobacco (Vaeck et al., 1987).. 
On the other hand, insecticidal plants were provided 
from lettuce transformants using a full-length gene. 

It has also been reported that the full length gene 
from B.t.k. HD-73 gave some insecticidal effect in 
30 tobacco (Adang et al., 1987). However, the B.t. mRNA 
detected in these plants was only 1.7 kb compared to 
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the expected 3.7 kb indicating improper expression of 

the gene. It was suggested that this truncated mRNA 

was too short to encode a functional truncated toxin, 

5 

but there must have been a low level of longer mRNA in 
some plants or no insecticidal activity would have 
been observed. Others have reported in a publication 
that they observed a large amount of shorter than 
expected mRNA from a truncated B.t.k. gene, but some 
10 mRNA of the expected size was also observed. In fact, 
it was suggested that expression of the full length 
gene is toxic to tobacco callus (Barton et al., 1987). 
The above illustrates that lepidopteran type B.t. 
genes are poorly expressed in plants compared to other 
15 chimeric genes previously expressed from the same 
promoter cassettes. 

The expression of B.t.t. in tomato and potato is at 
levels similar to that of B.t.k. (i.e., poor). B.t.t. 
and B.t.k. genes share only limited sequence homology, 
but they share many common features in terms of base 
compos.ition and the presence of particular A+T rich 
elements. 

All reports in the field have noted the lower than 
expected expression of B.t. genes in plants. In 
general, insecticidal efficacy has been measured using 
insects very sensitive to B.t. toxin such as tobacco 
hornworm. Although it has been possible to obtain 
plants totally protected against tobacco hornworm, it 
is important to note that hornworm is up to 500 fold 
more sensitive to B.t. toxin than some agronomically 
important insect pests such as beet armyworm. It is 
therefore of interest to obtain transgenic plants that 
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are protected against all important lepidopteran peSts 
(or against Colorado potato beetle in the case of B.t. 
tenebrionis ), and in addition to have a level of B.t. 

® expression that provides an additional safety margin 
over and above the efficacious protection level. It 
is also important to devise plant genes which function 
reproducibly from species to species, so that insect 
resistant plants can be obtained in a predictable 
10 fashion. 

In order to achieve these goals, it is important to 
understand the nature of the poorer than expected 
expression of B.t. genes in plants. The level of 

stable B.t. mRNA in plants is much lower than 
]»j expected. That is, compared to other coding sequences 
driven by the same promoter, the level of B.t. mRNA 
measured by Northern analysis or nuclease protection 
experiments is much lower. For example, tomato plant 
337 (Fischhoff et al., 1987) was selected as the best 
expressing plant with pMON9711 which contains the 

20 

B.t.k. HD-1 Kpnl fragment driven by the CaMV 35S 

promoter and contains the NOS-NPTII-NOS selectable 

marker gene. In this plant the level of B.t. mRNA is 

between 100 to 1000 fold lower than the level of NPTII 

mRNA, even though the 35S promoter is approximately 50- 
25 

fold stronger than the NOS promoter (Sanders et al., 
1987) . 

The level of B.t. toxin protein detected in plants 
is consistent with the low level of B.t. mRNA. 
Moreover, the insecticidal efficacy of the transgenic 
plants correlates with the B.t. protein level 
indicating that the toxin protein produced in plants 
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is biologically active. Therefore, the low level of 
B.t. toxin expression may be the result of the low 

levels of B.t. mRNA. 

c 

Messenger RNA levels are determined by the rate of 
synthesis and rate of degradation. It is the balance 
between these two that determines the steady state 
level of mRNA. The rate of synthesis has been 
maximized by the use of the CaMV 35S promoter, a 

10 strong constitutive plant expressible promoter. The 
use of other plant promoters such as nopaline synthase 
(NOS), mannopine synthase (MAS) and ribulose 
bisphosphatecarboxylase small subunit (RUBISCO) have 
not led to dramatic changes in the levels of B.t. 

15 toxin protein expression indicating that the effects 
determining B.t. toxin protein levels are promoter 
independent. These data imply that the coding 
sequences of DNA genes encoding B.t. toxin proteins 
are somehow responsible for the poor expression level, 
and that this effect is manifested by a low level of 

20 

accumulated stable mRNA. 

Lower than expected levels of mRNA have been 
observed with four different lepidopteran specific 
genes (two from B.t.k. HD-1; B.t. berllne r and B.t.k. 
HD-73) as well as the gene from the coleopteran 

Off 

specific B.t. tenebrionis. It appears that for 
lepidopteran type B.t. genes these effects are 
manifest more strongly in the full length coding 
sequences than in the truncated coding sequences. 
These effects are seen across plant species although 
their magnitude seems greater in some plant species 
such as tobacco. 
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The nature of the coding sequences of B. t. genes 

distinguishes them from plant genes as well as many 

other heterologous genes expressed in plants. In 
5 

particular, B.t. genes are very rich (~62%) in adenine 

(A) and thymine (T) while plant genes and most 

bacterial genes which have been expressed in plants 

are on the order of 45-55% A+T. The A+T content of 

the genomes (and thus the genes) of any organism afe 

10 features of that organism and reflect its evolutionary 

history. While within any one organism genes have 

similar A+T content, the A+T content can vary 

tremendously from organism to organism. For example, 

some Bacillus species have among the most A+T rich 

15 genomes while some Steptomyces species are among the 

least A+T rich genomes (~30 to 35% A+T). 

Due to the degeneracy of the genetic code and the 

limited number of codon choices for any amino acid, 

most of the "excess" A+T of the structural coding 

sequences of some Bacillus species are found in the 
A) 

third position of the codons. That is, genes of some 
Bacillus species have A or T as the third nucleotide 
in many codons. Thus A+T content in part can 
determine codon usage bias. In addition, it is clear 
that genes evolve for maximum function in the organism 

25 

in which they evolve. This means that particular 
nucleotide sequences found in a gene from one 
organism, where they may play no role except to code 
for a particular stretch of amino acids, have the 
potential to be recognized as gene control elements- in 
30 another organism (such as transcriptional'promoters or 
terminators, polyA addition sites, intron splice 
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sites, or specific mRNA degradation signals). It is 

perhaps surprising that such misread signals are not a 

more common feature of heterologous gene expression, 

5 

but this can be explained in part by the relatively 

homogeneous A+T content (-50%) of many organisms. 

This A+T content plus the nature of the genetic code 

put clear constraints on the likliehood of occurence 

of any particular oligonucleotide sequence. Thus, a 

10 gene from E. coli with a 50% A+T content is much less 

likely to contain any particular A+T rich segment than 

a gene from B. thuringiensis. 

As described above, the expression of B.t. toxin 

protein in plants has been problematic. Although the 

15 observations made in other systems described above 

offer the hope of a means to elevate the expression 

level of B.t. toxin proteins in plants, the success 

obtained by the present method is quite unexpected. 

Indeed, inasmuch as it has been recently reported that 

expression of the full-length B.t.k. toxin protein in 
AJ 

tobacco makes callus tissue necrotic (Barton et al., 

1987); one would reasonably expect that high level 

expression of B.t. toxin protein to be unattainable 

due to the reported toxicity effects. 

In its most rigorous application, the method of the 
25 

present invention involves the modification of an 
existing structural coding sequence ("structural 
gene") which codes for a particular protein by removal 
of ATTTA sequences and putative polyadenylation 
signals by site directed mutagenesis of the DNA 
comprising the structural gene. It is most preferred 
that substantially all the polyadenylation signals and 
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ATTTA sequences are removed although enhanced 
expression levels are observed with only partial 
removal of either of the above identified sequences. 
® Alternately if a synthetic gene is prepared which 
codes for the expression of the subject protein, 
codons are selected to avoid the ATTTA sequence and 
putative polyadenylation signals. For purposes of the 
present invention putative polyadenylation signals 
10 include, but are not necessarily limited to, AATAAA, 
AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, 

ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATTAAA, 

AATTAA, AATACA and CATAAA. In replacing the ATTTA 
sequences and polyadenylation signals, codons are 
15 preferably utilized which avoid the codons which are 
rarely found in plant genomes. 

Another embodiment of the present invention, 

represented in the flow diagram of Figure- 1, employs a 

method for the modification of an existing structural 

2 ^ gene or alternately the de novo synthesis of a 

structural gene which method is somewhat less rigorous- 

than the method first described above. Referring to 

Figure 1, the selected DNA sequence is scanned to 

identify regions with greater than four consecutive 

adenine (A) or thymine (T) nucleotides. The A+T 

25 

regions are scanned for potential plant 
polyadenylation signals. Although the absence of five 
or more consecutive A or T nucleotides eliminates most 
plant polyadenylation signals, if there are more than 
one of the minor polyadenylation signals identified 
30 within ten nucleotides of each other, then the 
nucleotide sequence of this region is preferably 
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alt ered to remove these signals while maintaining the 
original encoded amino acid sequence. 

The second step is to consider the 15 to 30 
** nucleotide regions surrounding the A+T rich region 
identified in step one. If the A+T content of the 
surrounding region is less than 80%, the region should 
be examined for polyadenylation signals. Alteration 
of the region based on polyadenylation signals is 
10 dependent upon (1) the number of polyadenylation 
signals present and (2) presence of a major plant 
polyadenylation signal. 

The extended region is examined for the presence of 

plant polyadenylation signals. The polyadenylation 

15 signals are removed by site-directed mutagenesis of 

the DNA sequence. The extended region is also 

examined for multiple copies of the ATTTA sequence 

which are also removed by mutagenesis. 

It is also preferred that regions comprising many 

consecutive A+T bases or G+C bases are disrupted since 
AJ 

these regions are predicted to have a higher 
likelihood to form hairpin structure due to self¬ 
complementarity. Therefore, insertion of 

heterogeneous base pairs would reduce the likelihood 

of self-complementary secondary structure formation 
25 

which are known to inhibit transcription and/or 
translation in some organisms. In most cases, the 
adverse effects may be minimized by using sequences 
which do not contain more than five consecutive A+T or 
G+C. 
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SYNTHETIC OLIGONUCLEOTIDES FOR MUTAGENESIS 

The oligonucleotides used in the mutagenesis are 

^ designed to maintain the proper amino acid sequence 

and reading frame and preferably to not introduce 

common restriction sites such as Bglll, Hindlll, SacI, 

Kpnl, EcoRI, Ncol, Pst I and Sail into the modified 

gene. These restriction sites are found in multi- 

10 linker insertion sites of cloning vectors such as 

plasmids pUC118 and pMON7258. Of course/ the 

introduction of new polyadenylation signals, ATTTA 

sequences or consecutive stretches of more than five 

A+T or G+C, should also be avoided. The preferred 

35 size for the oligonucleotides is around 40-50 bases, 

but fragments ranging from 18 to 100 bases have been 

utilized. In most cases, a minimum of 5 to 8 base 

pairs of homology to the template DNA on both ends of 

the synthesized fragment are maintained to insure 

proper hybridization of the primer to the template. 

The oligonucleotides should avoid sequences longer 

than five base pairs A+T or G+C. Codons used in the 

replacement of wild-type codons should preferably 

avoid the TA or CG doublet wherever possible. Codons 

are selected from a plant preferred codon table (such 
25 

as Table I below) so as to avoid codons which are 
rarely found in plant genomes, and efforts should be 
made to select codons to preferably adjust the G+C 
content to about 50%. 
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Percent Usage 


7 

11 

5 

25 

29 

23 


8 

20 

10 

28 

5 

30 


14 
26 

3 

21 

21 

15 


21 

41 

7 

31 


45 

19 

9 

26 


23 

32 

3 

41 


j 
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5 


10 


15 


20 


25 


30 


Table I - continued 



GLY 

GGA 

32 


GGC 

20 


GGG 

11 


GGU 

37 

ILE 

AUA 

12 


AUC 

45 


AUU 

43 

VAL 

GUA 

9 


GUC 

20 


GUG 

28 


GUU 

43 

LYS 

AAA 

36 


AAG 

64 

ASN 

AAC 

72 


AAU 

28 

GLN 

CAA 

64 


CAG 

36 

HIS 

CAC 

65 


CAU 

35 

GLU 

GAA 

48 


GAG 

52 

ASP 

GAC 

48 


GAU 

52 

TYR 

OAC 

68 


UAU 

32 

CYS 

OGC 

78 


OGU 

22 
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Table I - continued 

Preferred Codon Usage in Plants 

5 Percent Usage 

Amino..Acid Codon in Plants 


PHE 

UUC 

56 


UUU 

44 

MET 

AUG 

100 

TRP 

UGG 

100 


10 

Regions with many consecutive A+T bases or G+C 
bases are predicted to have a higher likelihood to 
form hairpin structures due to self-complementarity. 
Disruption of these regions by the insertion of 
15 heterogeneous base pairs is preferred and should 
reduce the likelihood of the formation of self¬ 
complementary secondary structures such as hairpins 
which are known in some organisms to inhibit 
transcription (transcriptional terminators) and 
20 translation (attenuators). However, it is difficult 
to predict the biological effect of a potential 
hairpin forming region. 

It is evident to those skilled in the art that 
while the above description is directed toward the 
2 g modification of the DNA sequences of wild-type genes, 
the present method can be used to construct a 
completely synthetic gene for a given amino • acid 
sequence. Regions with five or more consecutive A+T 
or G+C nucleotides should be avoided. Codons should 
be selected avoiding the TA and CG doublets in codons 
whenever possible. Codon usage can be normalized 
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against a plant preferred codon usage table (such as 

Table I) and the G+C content preferably adjusted to 

about 50%. The resulting sequence should be examined 
r 

to ensure that there are minimal putative plant 
polyadenylation signals and ATTTA sequences 
Restriction sites found in commonly used cloning 
vectors are also preferably avoided. However, 
placement of several unique restriction sites 
10 throughout the gene is useful for analysis of gene 
expression or construction of gene variants. 

Plant Gene Construction 


15 


20 


25 


30 


The expression of a plant gene which exists in 
double-stranded DNA form involves transcription of 
messenger RNA (mRNA) from one strand of the DNA by RNA 
polymerase enzyme, and the subsequent processing of 
the mRNA primary transcript inside the nucleus. This 
processing involves a 3' non-translated region which 
adds polyadenylate nucleotides to the 3’ end of the 
RNA. Transcription of DNA into mRNA is regulated by a 
region of DNA usually referred to as the "promoter." 
The promoter region contains a sequence of bases that 
signals RNA polymerase to associate with the DNA and 
to initiate the transcription of mRNA using one of the 
DNA strands as a template to make a corresponding 
strand of RNA. 

A number of promoters which are active in plant 
cells have been described in the literature. These 
include the nopaline synthase (NOS) and octopine, 
synthase (OCS) promoters (which are carried on tumor— 
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in due ing plasmids of Agrobacterium tumefaciens) , the 
Cauliflower Mosaic Virus (CaMV) 19S and 35S promoters, 
the light-inducible promoter from the small subunit of 
® ribulose bis-phosphate carboxylase (ssRUBISCO, a very 
abundant plant polypeptide) and the mannopine synthase 
(MAS) promoter (Velten et al. 1984 and Velten & 

Schell, 1985). All of these promoters have been used 
to create various types of DNA constructs.which have 
10 been expressed in plants (see e.g., PCT publication 
WO84/02913 (Rogers et al., Monsanto). 

Promoters which are known or are found to cause 
transcription of RNA in plant cells can be used in the 
present invention. Such promoters may be obtained 
jg from plants or plant viruses and include, but are not 
limited to, the CaMV35S promoter and promoters 
isolated from plant genes such as ssRUBISCO genes. As 
described below, it is preferred that the particular 
promoter selected should be capable of causing 
sufficient expression to result in the production of 
an effective amount of protein. 

The promoters used in the DNA constructs (i.e. 
chimeric plant genes) of the present invention may be 
modified, if desired, to affect their control 
characteristics. For example, the CaMV35S promoter 
^ may be ligated to the portion of the ssRUBISCO gene 
that represses the expression of ssRUBISCO in the 
absence of light, to create a promoter which is active 
in leaves but not in roots. The resulting chimeric 
promoter may be used as described herein. For 
purposes of this description, the phrase "CaMV35S" 
promoter thus includes variations of CaMV35S promoter. 
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e.g., promoters derived by means of ligation with 

operator regions, random or controlled mutagenesis, 

etc. Furthermore, the promoters may be altered to 
g 

contain multiple "enhancer sequences" to assist in 
elevating gene expression. 

The RNA produced by a DNA construct of the present 
invention also contains a 5' non-translated leader 
sequence. This sequence can be derived from the 
10 promoter selected to express the gene, and can be 
specifically modified so as to increase translation of 
the mRNA. The 5' non-translated regions can also be 
obtained from viral RNA's, from suitable eukaryotic 
genes, or from a synthetic gene sequence. The present 
jg invention is not limited to constructs, as presented 
in the following examples. Rather, the non-translated 
leader sequence can be part of the 5' end of the non- 
translated region of the coding sequence for the virus 
coat protein, or part of the promoter sequence, or can 
be derived from an unrelated promoter or coding 

20 

sequence. In any case, it is preferred that the 
sequence flanking the initiation site conform to the 
translational consensus sequence rules for enhanced 
translation initiation reported by Kozak (1984). 

The DNA construct of the present invention also 
25 contains a modified or fully-synthetic structural 
coding sequence which has been changed to enhance the 
performance of the gene in plants. In a particular 
embodiment of the present invention the enhancement 
method has been applied to design modified and fully 
synthetic genes encoding the crystal toxin protein of 
Bacillus thurlnglensis . The structural genes of the 
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present invention may optionally encode a fusion 
protein comprising an amino-terminal chloroplast 
transit peptide or secretory signal sequence (see for 

g 

instance^ Examples 10 and 11). 

The DNA construct also contains a 3' non-translated 
region. The 3' non-translated region contains a 
polyadenylation signal which functions in plants to 
cause the addition of polyadenylate nucleotides to the 
10 3' end of the viral RNA. Examples of suitable 3' 

regions are (1) the 3' transcribed, non-translated 
regions containing the polyadenylation signal of 
Agrobacterium tumor-inducing (Ti) plasmid genes, such 
as the nopaline synthase (NOS) gene, and (2) plant 
jg genes like the soybean storage protein (7S) genes and 
the small subunit of the RuBP carboxylase (E9) gene. 
An example of a preferred 3'. region is that from the 
IS gene, described in greater detail in the examples 
below. 

20 

Plant Transformation 

A chimeric plant gene containing a structural 

coding sequence of the present invention can be 

inserted into the genome of a plant by any suitable 
25 

method. Suitable plants for use in the practice of 
the present invention include, but are not limited to, 
soybean, cotton, alfalfa, oilseed rape, flax, tomato, 
sugarbeet, sunflower, potato, tobacco, maize, rice and 
wheat. Suitable plant transformation vectors include 
those derived from a Ti plasmid of Agrobacterium 
tumefaciens, as well as those disclosed, e.g., by 
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Herrera-Estrella (1983), Bevan (1983), Klee (1985) and 
EPO publication 120,516 (Schilperoort et al.).: In 
addition to plant transformation vectors derived from 

5 

the Ti or root-inducing (Ri) plasmids of 
Agrobacterium, alternative methods can be used to 
insert the DNA constructs of this invention into plant 
cells. Such methods may involve, for example, the use 
of liposomes, electroporation, chemicals that increase 
10 free DNA uptake, free DNA delivery via microprojectile 
bombardment, and transformation using viruses or 
pollen. 

A particularly useful Ti plasmid cassette vector 

for transformation of dicotyledonous plants is shown 

25 in Figure 5. Referring to Figure 5, the expression 

cassette pMON893 consists of the enhanced CaMV35S 

promoter (EN 35S) and the 3' end including 

polyadenylation signals from a soybean gene encoding 

the alpha-prime subunit of beta-conglycinin. Between 

these two elements is a multilinker containing 
A) 

multiple restriction sites for the insertion of genes. 

The enhanced CaMV35S promoter was constructed as 
follows. A fragment of the CaMV35S promoter extending 
between position -343 and +9 was previously 
constructed in pUC13 by Odell et al. (1985). This 
segment contains a region identified by Odell et al. 
(1985) as being necessary for maximal expression of 
the CaMV35S promoter. It was excised as a ClaT- 
Hindlll fragment, made blunt ended with DNA 
polymerase I (Klenow fragment) and inserted into the 
30 Hindi site of pUC18. This upstream region of the 35S 
promoter was excised from this plasmid as a Hindlll- 
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EcoRV fragment (extending from -343 to -90) and 
inserted into the same plasmid between the Hindlll and 
Pst I sites. The enhanced CaMV35S promoter thus 
** contains a duplication of sequences between -343 
and -90 (Kay et al. f 1987). 

The 3' end of the 7S gene is derived from the 7S 
gene contained on the clone designated 17.1 (Schuler 
et al., 1982). This 3* end fragment, which includes 
10 the polyadenylation signals, extends from an Avail 
site located about 30 bp upstream of the termination 
codon for the beta-conglycinin gene in clone 17.1 to 
an EcoRI site located about 450 bp downstream of this 
termination codon. 

15 The remainder of pMON893 contains a segment of 

pBR322 which provides an origin of replication in 
E. coli and a region for homologous recombination with 
the disarmed T-DNA in Agrobacterium strain ACO 
(described below); the oriV region from the broad host 
2 q range plasmid RK1; the streptomycin/spectinomycin 
resistance gene from Tn7; and a chimeric NPTII gene, 
containing the CaMV35S promoter and the nopaline 
synthase (NOS) 3' end, which provides kanamycin 
resistance in transformed plant cells. 

Referring to Figure 6, transformation vector 
25 

plasmid pMON900 is a derivative of pMON893. The 
enhanced CaMV35S promoter of pMON893 has been replaced 
with the 1.5kb mannopine synthase (MAS) promoter 
(Velten et al. 1984). The other segments are the same 
as plasmid pMON893. After incorporation of a DNA 
30 construct into plasmid vector pMON893 or pMON900, the 
intermediate vector is introduced into A. tumefaciens 
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strain ACO which contains a disarmed Ti plasmid. 

Cointegrate Ti plasmid vectors are selected and used 

to transform dicotyledonous plants. 

5 

Referring to Figure 7, A. tumefaciens ACO is a 
disarmed strain similar to pTiB6SE described by Fraley 
etal. (1985). For construction of ACO the starting 
Agrobacterium strain was the strain A208 which 
contains a nopaline-type Ti plasmid. The Ti plasmid 
10 was disarmed in a manner similar to that described by 
Fraley et al. (1985) so that essentially all of the 
native T-DNA was removed except for the left border 
and a few hundred base pairs of T-DNA inside the left 
border. The remainder of the T-DNA extending to a 
15 point just beyond the right border was replaced with a 
novel piece of DNA including (from left to right) a 
segment of pBR322, the oriv region from plasmid RK2, 
and the kanamycin resistance gene from Tn601. The 
pBR322 and oriv segments are similar to the segments 
in pMON893 and provide a region of homology for 

20 

cointegrate formation. 

The following examples are provided to better 

elucidate the practice of the present invention and 

should not be interpreted in any way to limit the 

scope of the present invention. Those skilled in the 
25 

art will recognize that various modifications, 
truncations etc. can be made to the methods and genes 
described herein while not departing from the spirit 
and scope of the present invention. 
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Example 1 — Modified B.t. k. HD-1 Gene 

Referring to Figure 2, the wild-type B.t.k. HD-1 
® gene is known to be expressed poorly in plants as a 
full length gene or as a truncated gene. The G+C 
content of the B.t.k. gene is low (37%) containing 
many A+T rich regions, potential polyadenylation sites 
(18 sites; see Table II for the list of sequences) 
10 and numerous ATTTA sequences. 

Table II 

List of Sequences of the Potent,la 1 
15 -Rolvadenvlation Signals 


AATAAA* 

AAGCAT 

AATAAT* 

ATTAAT 

AACCAA 

ATACAT 

ATATAA 

AAAATA 

AATCAA 

ATTAAA** 

ATACTA 

AATTAA** 

ATAAAA 

AATACA* * 

ATGAAA 

CAT AAA* * 


* indicates a potential major plant polyadenylation 
site. 

** indicates a potential minor animal polyadenylation 
site. 

All others are potential minor plant polyadenylation sites. 
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Table III lists the synthetic oligonucleotides 
designed and synthesized for the site-directed 
mutagenesis of the B.t.k. HD-1 gene. 


Table III 


Mutagene sis Primers for B.t.k. HD-1 Gene 


10 

Primer 

Length (bp) 

Secruence 



BTK185 

18 

TCCCCAGATA 

ATATCAAC 


BTK240 

48 

GGCTTGATTC 

CTAGCGAACT 

15 



CTTCGATTCT 

CTGGTTGATG 




AGCTGTTC 



BTK462 

54 

CAAAACTGAG 

AGGTGGAGGT 




TGGCAGCTTG 

AACGTACACG 

20 



GAGAGGAGAGGAAC 


BTK669 

48 

AGTTAGTGTA 

AGCTCTCTTC 




TGAACTGGTT 

GTACCTGATC 




CAATCTCT 


25 

BTK930 

39 

AGCCATGATC 

TGGTGACCGG 


ACCAGTAGTA TTCTCCTCT 

BTK1110 32 AGTTGTTGGT TGTTGATCCC 

GATGTTAAAA GG 
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Table III - continued 


5 


Mutagene sis Primers for B.t.k. H D-1 Gene 


Primer 

Length (bo) 

Sequence 


BTK1380A 

37 

GTGATGAAGG 

GATGATGTTG 



TTGAACTCAG 

CACTACG 

BTK1380T 

100 

CAGAAGTTCC 

AGAGCCAAGA 



TTAGTAGACT 

TGGTGAGTGG 



GATTTGGGTG 

ATTTGTGATG 



AAGGGATGAT 

GTTGTTGAAC 



TCAGCACTAC 

GATGTATCCA 

BTK1600 

27 

TGATGTGTGG 

AACTGAAGGT 



TTGTGGT 



The B.t.k. HD-1 gene (Bglll fragment from pMON9921 

encoding amino acids 29-607 with a Met-Ala at the N- 

terminus) was cloned into pMON7258 (pUC118 derivative 

which contains a Bglll site in the multilinker cloning 

region) at the Bglll site resulting in pMON5342. The 

orientation of the B.t.k. gene was chosen so that the 
25 

opposite strand (negative strand) was synthesized in 
filamentous phage particles for the mutagenesis, The 
procedure of Kunkle (1985) was used for the 
mutagenesis using plasmid pMON5342 as starting 
material. 

The regions for mutagenesis were selected in the 
following manner. All regions of the DNA sequence of 
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the B.t.k. gene were identified which contained five 

or more consecutive base pairs which were A or T. 

These were ranked in terms of length and highest 

5 

percentage of A+T in the surrounding sequence over a 

20-30 base pair region. The DNA was then analysed for 

regions which might contain polyadenylation sites (see 

Table II above) or ATTTA sequences. Oligonucleotides 

were designed which maximized the elimination of A+T 

10 consecutive regions which contained one or more 

polyadenylation sites or ATTTA sequences. Two 

potential plant polyadenylation sites were rated more 

critical (see Table II) based on published reports. 

Codons were selected which increased G+C content, did 

IQ not generate restriction sites for enzymes useful for 

cloning and assembly of the modified gene (BamHI, 

Bglll, SacI, Ncol, EcoRV) and did not contain the 

doublets TA or GC which have been reported to be 

infrequently found in codons in plants. The 

oligonucleotides were at least 18 bp long ranging up 

to 100 base pairs and contained at least 5-8 base 

pairs of direct homology to native sequences at the 

ends of the fragments for efficient hybridization and 

priming in site-directed mutagenesis reactions 

Figure 2 compares the wild-type B.t.k. HD-1 gene 
25 

sequence with the sequence which resulted from the 
modifications by site-directed mutagenesis. 

The end result of these changes was to increase the 
G+C content of B.t.k. gene from 37% to 41% while also 
decreasing the potential plant polyadenylation sites 
from 18 to 7 and decreasing the ATTTA regions from 13 
to 7. Specifically, the mutagenesis changes from 
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5 


10 


15 


20 


25 


amino (5') terminus to the carboxy (3') terminus are 
as follows: 

BTK185 is an 18-mer used to eliminate a plant 
polyadenylation site in the midst of a nine base pair 
region of A+T. 

BTK240 is a 48-mer. Seven base pairs were changed 
by this oligonucleotide to eliminate three potential 
polyadenylation sites (2 AACCAA, 1 AATTAA). Another 
region close to the region altered by BTK240, starting 
at bp 312, had a high A+T content (13 of 15 base 
pairs) and an ATTTA region. However, it did not 
contain a potential polyadenylation site and its 
longest string of uninterrupted A+T was seven base 
pairs. 

BTK4 62 is a 54-mer introducing 13 base pair 
changes. The first six changes were to reduce the A+T 
richness of the gene by replacing wild—type codons 
with codons containing G and C while avoiding the CG 
doublet. The next seven changes made by BTK462 were 
used to eliminate an A+T rich region (13 of 14 base 
pairs were A or T) containing two ATTTA regions. 

BTK669 is a 48-mer making nine individual base pair 
changes eliminating three possible polyadenylation 
sites (ATATAA, AATCAA, and AATTAA) and a single ATTTA 
site. 

BTK930 is a 39—mer designed to increase the G+C 
content and to eliminate a potential polyadenylation 
site (AATAAT - a major site). This region did contain 
a nine base pair region of consecutive A+T sequence. 
One of the base pair changes was a G to A because a G 
at this position would have created a G+C rich region 
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(CCGG(G)C). Since sequencing reactions indicate that 
there can be difficulties generating sequence through 
G+C consecutive bases, it was thought to be prudent to 

g 

avoid generating potentially problematic regions even 
if they were problematic only in vitro. 

BTK1110 is a 32-mer designed to introduce five 
changes in the wild-type gene. One potential site 
(AATAAT - a major site) was eliminated in the midst of 
10 an A+T rich region (19 of 22 base pairs). 

BTK1380A and BTK1380T are responsible for 14 
individual base pair changes. The first region 
(1380A) has 17 consecutive A+T base pairs. In this 
region is an ATTTA and a potential polyadenylation 
jg site (AATAAT). The 100-mer (1380T) contains all the 
changes dictated by 1380A. The large size of this 
primer was in part an experiment to determine if it 
was feasible to utilize large oligonucleotides for 
mutagenesis (over 60 bases in length). A second 
consideration was that the 100-mer was used to 
mutagenize a template which had previously been 
mutageneized by 1380A. The original primer ordered to 
mutagenize the region downstream and adjacent to 1380A 
did not anneal efficiently to the desired site as 
indicated by an inability to obtain clean sequence 
^ utilizing the primer. The large region of homology of 
1380T did assure proper annealing. The extended size 
of 1380T was more of a convenience rather than a 
necessity. The second region adjacent to 1380A 
covered by 1380T has a high A+T content (22 of: 29 
30 bases are A or T) . 
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BTK1600 is a 27-mer responsible for five individual 

base pair changes. An ATTTA region and a plant 

polyadenylation site were identified and the 

5 

appropriate changes engineered. 

A total of 62 bases were changed by site-directed 
mutagenesis. The G+C content increased by 55 base 
pairs, the potential polyadenylation sites were 
reduced from 18 to seven and the ATTTA sequences 
10 decreased from 13 to seven. The changes in the DNA 
sequence resulted in changes in 55 of the 57 9 codons 
in the truncated B.t.k. gene in pMON5342 
(approximately 9.5%). 

Referring to Table IV modified B.t.k. HD-1 genes 

15 were constructed that contained all of the above 

modifications (pMON5370) or various subsets of 

individual modifications. These genes were inserted 

into pMON893 for plant transformation and tobacco 

plants containing these genes were analyzed. The 

analysis, of tobacco plants with the individual 

modifications was undertaken for several reasons. 

Expression of the wild type truncated gene in tobacco 

is very poor, resulting in infrequent identification 

of plants toxic to THW. Toxicity is defined by leaf 

feeding assays as at least 60% mortality of tobacco 
25 

hornworm neonate larvae with a damage rating of 1 or 
less (scale is 0 to 4; 0 is equivalent to total 
protection, 4 total damage) . The modified HD-1 gene 
(pMON5370) shows a large increase in expression 
(estimated to be approximately 100-fold; see Table 
VIII) in tobacco. Therefore, increases in expression 
of the wild-type gene due to indidvidual modifications 
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would be apparently a large Increase in the frequency 
of toxic tobacco plants and the presence of detectable 
B.t.k. protein. Results are shown in the following 
table: 

Table IV 



35 

PMON5370 

185,240,669,930, 

1110,1380a+b,1600 

38 

22 


pMON10707 

185,240,462,669 r 

48 

19 

20 

PMON10706 

930,1110,1380a+b,1600 

43 

1 


PMON10539 

185 

55 

2 


PMON10537 

240 

57 

17 

25 

PMON10540 

185,240 

88 

23 


PMON10705 

462 

47 

1 


The effects of each individual oligonucleotides' 
changes on expression did reveal some overall trends. 
30 Six different constructs were generated which were 
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designed to identify the key regions. The nine 
different oligonucleotides were divided in half by 
their position on the gene. Changes in the N-terminal 
half were incorporated into pMON10707 (185,240, 

462,669). C-terminal half changes were incorporated 
into pMON10706 (930,1110,1380a+b,1600) . The results 
of analysis of plants with these two constructs 
indicate that pMON10707 produces a substantial number 
M of toxic plants (19 of 48). Protein from these plants 
is detectable by ELISA analysis. pMON10706 plants 
were rarely identified as insecticidal (1 of 43) and 
the levels of B.t.k. were barely detectable by 
immunological analysis. Investigation of the N- 
]5 terminal changes in greater detail was done with 4 
pMON constructs; 10539 (185 alone), 10537 (240 alone), 
10540 (185 and 240) and 10705 (462 alone) . The 

results indicate that the presence of the changes in 
240 were required to generate a substantial number of 
toxic plants (pMON10540; 23 of 88, pMON10537; 17 of 

57). The absence of the 240 changes resulted in a low 
frequency of toxic plants with low B.t.k. protein 
levels, identical to results with the wild type gene. 
These results indicate that the changes in 240 are 
responsible for a substantial increase in B.t.k. 
expression levels over an analogous wild-type 
construct in tobacco. Changes in additional regions 
(185,462, 669) in conjunction with 240 may result in 
increases in B.t.k. expression (>2 fold). However, 
changes at the 240 region of the N-terminal portion of 
the gene do result in dramatic increases in 
expression. 
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Despite the importance of the alteration of the 240 
region in expression of modified genes, increased 
expression can be achieved by alteration of other 
® regions. Hybrid genes, part wild-type, part 
synthetic, were generated to determine the effects of 
synthetic gene segments on the levels of B.t.k. 
expression. A hybrid gene was generated with a 
synthetic N-terminal third (base pair 1 to 590 of 
10 Figure 2: to the Xbal site) with the C-terminal wild 
type B.t.k. HD-1 (pMON5378) Plants transformed with 
this vector were as toxic as plants transformed with 
the modified HD-1 gene (pMON5370). This is consistent 
with the alteration of the 240 region. However, 
15 pMON10538, a hybrid with a wild-type N-terminal third 

(wild type gene for the first 600 base pairs, to the 
second Xbal site) and a synthetic C-terminal last two- 
thirds (base pair 590 to 1845 of Figure 3 was used to 
transform tobacco and resulted in a dramatic increase 
in expression. The levels of expression do not appear 
to be as high as those seen with the synthetic gene, 
but are comparable to the modified gene levels. These 
results indicate that modification of the 240 segment 
is not essential to increased expression since 
pMON10538 has an intact 240 region. A fully synthetic 
^ gene is, in most cases, superior for expression levels 
of B.t.k. (See Example 2.) 

Example 2 — Fully Synthetic B.t.k. HD-1 Ggng 

30 A synthetic B.t.k. HD—1 gene was designed using the 

preferred plant codons listed in Table V below. 
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Table V lists the codons and frequency of use in plant 
genes of dicotyledonous plants compared to the 
frequency of their use in the wild type B.t.k. HD-1 
gene (amino acids 1-615) and the synthetic gene of 
this example. The total number of each amino acid in 
this segment of the gene is listed in the parenthesis 
under the amino acid designated. 


10 


15 


20 


25 
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Table V 


5 


Codon in Usage Sy nthetic B.t.k. HD-1 Gang 


Amino Acid Codon 

ARG CGA 

IQ (43) CGC 

CGG 

CGU 

AGA 

AGG 

15 

LEU CUA 

(49) CUC 

CUG 
CUU 
UUA 

20 UUG 

SER UCA 

(64) UCC 

UCG 

25 UCU 

AGC 

AGU 


Percent Usage in 
Plants/Wt B.t.k ./Svn 


7 

11 

2 

11 

5 

5 

5 

2 

0 

25 

14 

27 

29 

55 

41 

23 

14 

25 

8 

16 

4 

20 

0 

20 

10 

2 

6 

28 

22 

24 

5 

50 

0 

30 

10 

45 

14 

27 

5 

26 

9 

28 

3 

8 

0 

21 

19 

31 

21 

6 

32 

15 

31 

5 
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Table V - continued 


5 


Codon in Usage Synthetic B .t.k. HD-1 Gene 


Percent Usage in 

Amino Acid Codon Plants/ Wt B.t.k ./Svn 


THR 

ACA 

21 

31 

14 

(42) 

ACC 

41 

19 

53 


ACG 

7 

14 

0 


ACU 

31 

36 

33 

PRO 

CCA 

45 

35 

53 

(34) 

CCC 

19 

6 

12 


CCG 

9 

21 

3 


ecu 

26 

38 

32 

ALA 

GCA 

23 

38 

26 

(31) 

GCC 

32 

9 

29 


GCG 

3 

3 

0 


GCU 

41 

50 

45 

GLY 

GGA 

32 

52 

45 

(46) 

GGC 

20 

17 

15 


GGG 

11 

15 

6 


GGU 

37 

15 

34 

ILE 

AUA 

12 

39 

2 

(46) 

AUC 

45 

11 

67 


AUU 

43 

50 

30 
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Table V - continued 

Codon in Usage Synthetic B.t.k. HD-1 Gene 
Percent Usage in 


10 


15 


20 


25 


Amino Acid 

Codon 

Plants/Wt B. 

t.k./ 

VAL 

GUA 

9 

45 

3 

(38) 

GUC 

20 

5 

16 


GUG 

28 

11 

37 


GUU 

43 

39 

45 

LYS 

AAA 

36 

100 

33 

(3) 

AAG 

64 

0 

67 

ASN 

AAC 

72 

27 

80 

(44) 

AAU 

28 

73 

20 

GLN 

CAA 

64 

77 

61 

(31) 

CAG 

36 

23 

39 

HIS 

CAC 

65 

0 

80 

(10) 

CAU 

35 

100 

20 

GLU 

GAA 

48 

87 

50 

(30) 

GAG 

52 

13 

50 

ASP 

GAC 

48 

17 

65 

(23) 

GAU 

52 

83 

35 
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5 


Table V - continued 


10 


15 


20 


25 


Codon in Usage Synthetic B.t.k. HD-1 Gene 


Percent Usage in 


Amino Acid 

Codon 

Plants/Wt B- 

t.k./Svn 

TYR 

UAC 

68 

20 

72 

(25) 

UAU 

32 

80 

28 

CYS 

UGC 

78 

50 

100 

(2) 

UGU 

22 

50 

0 

PHE 

UUC 

56 

17 

83 

(36) 

UUU 

44 

83 

17 

MET 

AUG 

100 

100 

100 

(9) 





TRP 

UGG 

100 

100 

100 

(9) 





The resulting synthetic gene lacks ATTTA sequences/ 

contains only one potential 

polyadenylation site and 

has.a G+C 

content of 48.5%. 

Figure 3 is a comparison 

of the wild-type HD-1 

sequence to 

the. synthetic gene 

sequence 

for amino 

acids 1- 

■615. There is 


approximately 77% DNA homology between the synthetic 
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gene and the wild-type gene and 356 of the 615 codons 
have been changed (approximately 60%). 

5 

Example 3 — Synthetic B.t.k. HD-73 Gene 

The crystal protein toxin from B.t.k. HD-73 
exhibits a higher unit activity against some important 
agricultural pests. The toxin protein of HD-1 and HD- 
10 73 exhibit substantial homology (-90%) in. the N- 

terminal 450 amino acids, but differ substantially in 
the amino acid region 451-615. Fusion proteins 
comprising amino acids 1-450 of HD-1 and 451-615 of HD- 
73 exhibit the insecticidal properties of the wild- 
25 type HD-73. The strategy employed was to use the 5'- 
two thirds of the synthetic HD-1 gene (first 1350 
bases, up to the SacI site) and to dramatically modify 
the final 590 bases (through amino acid 645) of the HD- 
73 in a manner consistent with the algorithm used to 
design the synthetic HD-1 gene. Table VI below lists 

20 

the oligonucleotides used to modify the HD-73 gene in 
the order used in the gene from 5’ to 3’ end. Nine 
oligonucleotides were used in a 590 base pair region, 
each nucleotide ranging in size from 33 to 60 bases. 
The only regions left unchanged were areas where there 
2® were no long consecutive strings of A or T bases 
(longer than six). All polyadenylation sites and 
ATTTA sites were eliminated. 
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Table VI 


Mutagenesis Primers for B.t.k. HD-73 



Primer 

Length (bn) 

Seguence 



73K1363 

51 

AATACTATCG 

GATGCGATGA 




TGTTGTTGAA 

CTCAGCACTA 

10 



CGGTGTATCC 

A 


73K1437 

33 

TCCTGAAATG 

ACAGAACCGT 




TGAAGAGAAA 

GTT 

35 

73K1471 

48 

ATTTCCACTG 

CTGTTGAGTC 




TAACGAGGTC 

TCCACCAGTG 




. AATCCTGG 



73K1561 

60 

GTGAATAGGG 

GTCACAGAAG 

20 



CATACCTCAC 

ACGAACTCTA 



TATCTGGTAG 

ATGTTGGATGG 


73K1642 

33 

TGTAGCTGGA 

ACTGTATTGG 




AGAAGATGGA 

TGA 

25 

73K1675 

48 

TTCAAAGTAA 

CCGAAATCGC 




TGGATTGGAG 

ATTATCCAAG 




GAGGTAGC 



73K1741 

39 

ACTAAAGTTT 

CTAACACCCA 


CGATGTTACC GAGTGAAGA 


30 



WO 90/10076 


PCT/US90/00778 


-53- 


Table VI - continued 


5 


Mutagenesis Primers for B.t.k. HD-73 


Erimeg Length (bp) Sequence 


73K1797 36 

10 

73KTERM 54 


AACTGGAATG AACTCGAATC 
TGTCGATAAT CACTCC 

GGACACTAGA TCTTAGTGAT 
AATCGGTCAC ATTTGTCTTG 
AGTCCAAGCT GGTT 


15 The resulting gene has two potential 

polyadenylation sites (compared to 18 in the WT) and 

no ATTTA sequence (12 in the WT). The G+C content has 

increased from 37% to 48%. A total of 59 individual 

base pair changes were made using the primers in 

20 Table VI. Overall, there is 90% DNA homology between 

the region of the HD-73 gene modified by site directed 

mutagenesis and the wild-type sequence of the 

analogous region of HD-73. The synthetic HD-73 is a 

hybrid of the first 1360 bases from the synthetic HD-1 

and the next 590 bases or so modified HD-73 sequence. 
25 

Figure 4 is a comparison of the above-described 
synthetic B.t.k. HD-73 and the wild-type B.t.k. HD-73 
encoding amino acids 1-645. In the modified region of 
the HD-73 gene 44 of the 170 codons (25%) were changed 
as a result of the site-directed mutagenesis changes 
resulting from the oligonucleotides found in Table VI. 
Overall, approximately 50% of the codons in the 


30 



WO 90/10076 


PCT/US90/00778 


-54- 


synthetic B.t.k. HD-73 differ from the analogous 
segment of the wild-type and HD-73 gene. 

A one base pair deletion in the synthetic HD-73 
® gene was detected in the course of sequencing the 3' 
end at base pair 1890. This results in a frame-shift 
mutation at amino acid 625 with a premature stop codon 
at amino acid 640 (pMON5379). Table VII below compares 
the codon usage of the wild-type gene of B.t.k. HD-73 
10 versus the synthetic gene of this example for amino 
acids 451-645 and codon usage of naturally occurring 
genes of dicotyledonous plants. The total number of 
each amino acid encoded in this segment of the gene is 
found in the parentheses under the amino acid 
X 5 designation. 

Table VII 

Codon Usage In Synthetic B.t.k. HD-73 Gene 


Percent Usage in 


Amino Acid 

Codon 

Plants/Wt HD- 

-73/Syn 

ARG 

CGA 

7 

10 

0 

(10) 

CGC 

11 

0 

8 


CGG 

5 

10 

0 


CGU 

25 

20 

23 


AGA 

29 

60 

62 


AGG 

23 

0 

8 
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Table VII - continued 
Codon Usage in Syn thetic B.t.k. HD -73 Gene 


Amino Acid Codon 

Percent Osage 
Plants/Wt HD- 

in 

73/Svn 

LEO 

COA 

8 

25 

8 

(12) 

cue 

20 

17 

58 


CUG 

10 

17 

8 


CUO 

28 

8 

0 


OOA 

5 

33 

8 


DUG 

30 

0 

17 

SER 

OCA 

14 

24 

18 

(21) 

UCC 

26 

10 

27 


OCG 

3 

10 

0 


UCU 

21 

24 

18 


AGC 

21 

0 

14 


AGU 

15 

33 

23 

THR 

ACA 

21 

47 

38 

(15) 

ACC 

41 

13 

31 


ACG 

7 

13 

0 


ACU 

31 

27 

31 

PRO 

CCA 

45 

71 

71 

(7) 

CCC 

19 

0 

0 


CCG 

9 

14 

0 


CCO 

26 

14 

29 
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Table VII - continued 
Codon Us age in Synthetic B.t.k. HD-73 


5 

Amino Acid 

Codon 

Percent Usage in 
Plants/Wt HD-73/Svn 


ALA 

GCA 

23 

29 

31 


(14) 

GCC 

32 

7 

8 



GCG 

3 

21 

15 

10 


GCU 

41 

43 

46 


GLY 

GGA 

32 

33 

43 


(15) 

GGC 

20 

0 

0 



GGG 

11 

27 

14 

15 


GGD 

37 

40 

43 


ILE 

AUA 

12 

33 

7 


(15) 

AUC 

45 

7 

40 



AUU 

43 

60 

53 

20 

VAL 

GUA 

9 

40 

7 


(15) 

GUC 

20 

0 

7 



GUG 

28 

20 

36 



GUU 

43 

40 

50 

25 

LYS 

AAA 

36 

67 

100 


(3) 

AAG 

64 

33 

0 


ASN 

AAC 

72 

20 

53 


(20) 

AAD 

28 

80 

47 


PCT/US90/00778 


Gene 
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Table VII - continued 




GLN 

CAA 

64 

60 

67 


(5) 

CAG 

36 

40 

33 

10 

HIS 

CAC 

65 

67 

100 


(3) 

CAU 

35 

33 

0 


GLU 

GAA 

48 

86 

57 


(7) 

GAG 

52 

14 

43 

15 

ASP 

GAC 

48 

40 

50 


(5) 

GAD 

52 

60 

50 


TYR 

UAC 

68 

0 

20 

20 

(5) 

UAU 

32 

100 

80 


CYS 

UGC 

78 

0 

0 


(0) 

UGU 

22 

0 

0 

25 

PHE 

UUC 

56 

8 

67 

(13) 

UUU 

44 

92 

33 


MET 

AUG 

100 

100 

100 


(2) 





30 

TRP 

UGG 

100 

100 

100 


(2) 
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Another truncated synthetic HD-73 gene was 
constructed. The sequence of this synthetic HD-73 
gene is identical to that of the above synthetic HD-73 
** gene in the region in which they overlap (amino acids 
29-615), and it also encodes Met-Ala at the N- 
terminus. Figure 8 shows a comparison of this 
truncated synthetic HD-73 gene with the N-terminal Met- 
Ala versus the wild-type HD-73 gene. 

M While the previous examples have been directed at 
the preparation of synthetic and modified genes 
encoding truncated B.t.k. proteins, synthetic or 
modified genes can also be prepared which encode full 
length toxin proteins. 

15 One full length B.t.k. gene consists of the 
synthetic HD-73 sequence of Figure 4 from nucleotide 1- 
1845 plus wild-type HD-73 sequence encoding amino 
acids 616 to the C-terminus of the native protein. 
Figure 9 shows a comparison of this synthetic/wild- 
2 q type full length HD-73 gene versus the wild-type full 
length HD-73 gene. 

Another full length B.t.k. gene consists of the 

synthetic HD-73 sequence of Figure 4 from nucleotide 1- 

1845 plus a modified HD-73 sequence ending amino acids 

616 to the C-terminus of the native protein. The C- 
25 

terminal portion has been modified by site-directed 
mutagenesis to remove putative polyadenylation signals » 
and ATTTA sequences according to the algorithm of 
Figure 1. Figure 10 shows a comparison of this 
synthetic/modified full length HD—73 gene versus the 
30 wild-type full length HD-73 gene. 
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Another full length B.t.k. gene consists of a fully 
synthetic HD-73 sequence which incorporates the 

synthetic HD-73 sequence of Figure 4 from nucleotide 1- 

5 

1845 plus a synthetic sequence encoding amino acids 
616 to the C-terminus of the native protein. The C- 
terminal synthetic portion has been designed to 
eliminate putative polyadenylation signals and ATTTA 
sequences and to include plant preferred codons. 

10 Figure 11 shows a comparison of this fully synthetic 
full length HD-73 gene versus the wild-type full 
length HD-73 gene. 

Alternatively, another full length B.t.k. gene 
consists of a fully synthetic sequence comprising base 

15 pairs 1-1830 of B.t.k. HD-1 (Figure 3) and base pairs 
1834-3534 of B.t.k. HD-73 (Figure 11). 

Example 4 — Ex pression of Modified and Synthetic 

B.t.k. HD-1 and Synthetic HD-73 

20 

A number of plant transformation vectors for the 
expression of B.t.k. genes were constructed by 
incorporating the structural coding sequences of the 
previously described genes into plant transformation 
cassette vector pMON893. The respective intermediate 

OK 

" transformation vector is inserted into a suitable 
disarmed Agrobacterium vector such as A. tumefaciens 
ACO, supra. Tissue explants are cocultured with the 
disarmed Agrobacterium vector and plants regenerated 
under selection for kanamycin resistance using known 

30 protocols: tobacco (Horsch et al., 1985); tomato 
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(McCormick et al., 1986) and cotton (Trolinder et al., 
1987) . 

5 

a) Tobacco. 

The level of B.t.k. HD-1 protein in transgenic 
tobacco plants containing pMON9921 (wild type 
truncated), pMON5370 (modified HD-1, Example 1, Figure 
2) and pMON5377 (synthetic HD-1, Example 2, Figure 3) 

10 were analyzed by Western analysis. Leaf tissue was 
frozen in liquid nitrogen, ground to a fine powder and 
then ground in a 1:2 (wt:volume) of SDS-PAGE sample 
buffer. Samples were frozen on dry ice, then 
incubated for 10 minutes in a boiling water bath and 

15 microfuged for 10 minutes. The protein concentration 
of the supernatant was determined by the method of 
Bradford (Anal. Biochem. 72:248-254). Fifty ug of 
protein was run per lane on 9% SDS-PAGE gels, the 
protein transferred to nitrocellulose and the B.t.k. 

2 q HD-1 protein visualized using antibodies produced 
against B.t.k. HD-1 protein as the primary antibody 
and alkaline phosphatase conjugated second antibody as 
described by the manufacturer (Promega, Madison, WI) . 
Purified HD-1 tryptic fragment was used as the 
control. Whereas the B.t.k. protein from tobacco 

2R 

plants containing pMON9921 was below the level of 
detection, the B.t.k. protein from plants containing 
the modified (pMON5370) and synthetic (pMON5377) genes 
was easily detected. The B.t.k. protein from plants 
containing pMON9921 remained undetectable, even with 
10 fold longer incubation times. The relative levels 
of B.t.k. HD-1 protein in these plants is estimated in 


30 
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Table VIII. Because the protein from plants 

containing pMON9921 was not observed, the level of 

protein in these plants was estimated from the 

5 

relative mRNA levels (see below). Plants containing 
the modified gene (pMON5370) expressed approximately 
100 fold more B.t.k. protein than plants containing 
the wild-type gene (pMON9921) . Plants containing the 
fully synthetic B.t.k. HD-1 gene (pMON5377) expressed 
10 approximately five fold more protein than plants 
containing the modified gene. The modified gene 
contributes the majority of the increase in B.t.k. 
expression observed. The plants used to generate the 
above data are the best representatives from each 
25 construct based either on a tobacco hornworm bioassay 
or on data derived from previous Western analysis. 

Table VIII 


20 


Expression of B.t.k. HD-1 Protein 
in Transgenic Tobacco 


25 


Gene 

Description 

Vector 

B.t.k. Protein* 
Concentration 

Fold Increase 
in B.t.k. 
Eapresaion 

wild type 

pMON9921 

10 

l 

Modified 

pMON5370 

1000 

100 

Synthetic 

PM0N5377 

5000 

500 


* B.t.k. protein concentrations are expressed in 
30 ng/mg of total soluble protein. The level of B.t.k. 
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protein for plants containing the wild type gene are 
estimated from mRNA levels. 

5 

• Plants containing these genes were tested for 

bioactivity to determine whether the increased 

quantities of protein observed by Western analysis 

result in a corresponding increase in bioactivity. 

Leaves from the same plants used for the Western data 

10 in Table 1 were tested for bioactivity against two 

insects. A detached leaf bioassay was first done 

using tobacco hornworm, an extremely sensitive 

lepidopteran insect. Leaves from all three transgenic 

tobacco plants were totally protected and 100 % 

15 mortality of tobacco hornworm observed (see Table IX 

below). A much less sensitive insect, beet armyworm, 

was then used in another detached leaf bioassay. Beet 

armyworm is approximately 500 fold less sensitive to 

B.t.k. HD-1 protein than tobacco hornworm. The 

difference in sensitivity of these two insects was 
AJ 

determined using purified HD-1 protein in a diet 
incorporation assay (see below). Plants containing 
the wild-type gene (pMON9921) showed only minimal 
protection against beet armyworm, whereas plants 
containing the modified gene showed almost complete 

nc 

protection and plants containing the fully synthetic 
gene were totally protected against beet armyworm 
damage. The results of these bioassays confirm the 
levels of B.t.k. HD-1 expression observed in the 
Western analysis and demonstrates that the increased 
levels of B.t.k. HD-1 protein correlates with 
increased insecticidal activity. 


30 



WO 90/10076 


PCT/US90/00778 


-63- 


Table IX 


Protection of Tobacco Plants from 
Tobacco Hornworm and Beet Armvworm 


Gene 

Deseriotion 

Vector 

Tobacco Hornworm 

Damage* . 

Beet Armyworm 

Damace* _ 

None 

None 

NL 

NL 

Wild type 

pMON9921 

0 

3 

Modified 

PMON5370 

0 

1 

Synthetic 

PMON5377 

0 

0 


* Extent of insect damage was rated: 0, no damage; 
1, slight; 2, moderate; 3, severe; or NL, no leaf 
left. 


The bioactivity of the.B.t.ic. HD-1 protein produced 

20 

by these transgenic plants was further investigated to 
more accurately quantitate the relative activities. 
Leaf tissue from tobacco plants containing the wild- 
type, modified and synthetic genes were ground in 100 
mM sodium carbonate buffer, pH 10 at a 1:2 (wt:vol) 
25 ratio. Particulate material was removed by 
centrifugation. The supernatant was incorporated into 
a synthetic diet similar to that described by Marrone 
et al. (1985) . The diet medium was prepared the day 

of the test with the plant extract solutions 
incorporated in place of the 20% water component. One 
ml of the diet was aliquoted into 96 well plates. 


30 
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After the diet dried, one neonate tobacco budworm 

larva was added to each well. Sixteen insects were 

tested with each plant sample. The plants were 
5 

incubated at 27°C. After seven days, the larvae from 
each treatment were combined and weighed on an 
analytical balance. The average weight per insect was 
calculated and compared to a standard curve relating 
B.t.k. protein concentrations to average larval 
30 weight. Insect weight was inversely proportional (in 
a logarithmic manner) to the relative increase in 
B.t.k. protein concentration. The amount of B.t.k. HD- 
1 protein, based on the extent of larval growth 
inhibition was determined for two different plants 
25 containing each of the three genes. The specific 
activity (ng of B.t.k. HD-1 per mg of plant protein) 
was determined for each plant. Plants containing the 
modified HD—1 gene (pMON5370) averaged approximately 
1400 ng (1200 and 1600 ng) of B.t.k. HD-1 per mg of 
plant extract protein. This value compares closely 

20 

with the 1000 ng of B.t.k. HD-1 protein per mg of 
plant extract protein as determined by Western 
analysis (Table I). B.t.k. HD-1 concentrations for 
the plants containing the synthetic HD-1 gene averaged 
approximately 8200 ng (7200 and 9200 ng) of B.t.k. HD- 

nr 

1 protein per mg of plant extract protein. This 
number compares well to the 5000 ng of HD-1 protein 
per mg of plant extract protein estimated by Western 
analysis. Likewise, plants containing the synthetic 
gene showed approximately a six-fold higher specific 
30 activity than the corresponding plants containing the 
modified gene for these bioassays. In the Western 
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analysis the ratio was approximately 10 fold, again 
both are in good agreement. The level of B.t.k. 
protein in plants containing the wild-type HD-1 gene 
** (pMON9921) was too low to give a significant decrease 

in larval weight and hence was below a level that 
could be quantitated in this assay. In conclusion, 
the levels of Bit.k. HD-1 protein determined by both 
the bioassays and the Western analysis for these 
10 plants containing the modified and synthetic 'genes 
agree, which demonstrates that the B.t.k. HD-1 protein 
produced by these plants is biologically active. 

The levels of mRNA were determined in the plants 

containing the wild-type B.t.k. HD-1 gene (pMON9921) 

25 and the modified gene (pMON5370) to establish whether 

the increased levels of protein production result from 

increased transcription or translation. mRNA from 

plants containing the synthetic gene could not be 

analyzed directly with the same DNA probe as used for 

the wild—type and modified genes because of the 
A) 

numerous changes made in the coding sequence. mRNA 

was isolated and hybridized with a single-stranded DNA 

probe homologous to approximately the 5 1 90 bp of the 

wild-type or modified gene coding sequences. The 

hybrids were digested with SI nuclease and the 
25 

protected probe fragments analyzed by gel 
electrophoresis. Because the procedure used a large 
excess of probe and long hybridization time, the 

amount of protected probe is proportional to the 

amount of B.t.k. mRNA present in the sample. Two 

30 plants expressing the modified gene (pMON5370) were 
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found to produce up to ten-fold more RNA than a plant 
expressing the wild-type gene (pMON9921). 

The increased mRNA level from the modified gene is 
e 

consistent with the result expected from the 

modifications introduced into this gene. However, 

this 10 fold increase in mRNA with the modified gene 

compared to the wild-type gene is in contrast to the 

100 fold increase in B.t.k. protein from these genes 

10 in tobacco plants. If the two mRNAs were equally well 

translated then a 10 fold increase in stable mRNA 

would be expected to yield a 10 fold increase in 

protein. The higher increase in protein indicates 

that the modified gene mRNA is translated at about a 

jg 10 fold higher efficiency than wild-type. Thus, about 

half of the total effect on gene expression can be 

explained by changes in mRNA levels and about half to 

changes in translational efficiency. This increase in 

translational efficiency is striking in that only 

about 9.5% of the codons have been changed in the 
AJ 

modified gene; that is, this effect is clearly not due 
to wholesale codon usage changes. The increased 
translational efficiency could be due to changes in 
mRNA secondary structure that affect translation or to 
the removal of specific translational blockades due to 
® specific codons that were changed. 

The increased expression seen with the synthetic HD- 
1 gene was also seen with a synthetic HD-73 gene in 
tobacco. B.t.k. HD-73 was undetected in extracts of 
tobacco plants containing the wild-type truncated HD- 
30 73 gene (pMON5367), whereas B.t.k. HD-73 protein was 

easily detected in extracts from tobacco plants 
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containing the synthetic HD-73 gene of Figure 4 
(pMON5383). Approximately 1000 ng of B.t.k. HD-73 
protein was detected per mg of total soluble plant 

E 

protein. 

As described in Example 3 above, the B.t.k. HD-73 
protein encoded in pMON5383 contains a small C- 
terminal extension of amino acids not encoded in the 
wild-type HD-73 protein. These extra amino acids had 
10 no effect on insect toxicity or on increased plant 
expression. A second synthetic HD-73 gene was 
constructed as described in Example 3 (Figure 8) and 
used to transform tobacco (pMON5390). Analysis of 
plants containing pMON5390 showed that this gene was 
15 expressed at levels comparable to that of pMON5383 and 
that these plants had similar insecticidal efficacy. 

In tobacco plants the synthetic HD-1 gene was 

expressed at approximately a 5-fold higher level than 

the synthetic HD-73 gene. However, this synthetic HD- 

73 gene still was expressed at least 100-fold better 
Aj 

than the wild-type HD-73 gene. The HD-73 protein is 
approximately 5-fold more toxic to many insect pests 
than the HD-1 protein, so both synthetic HD-1 and HD- 
73 genes provide approximately comparable insecticidal 
efficacy in tobacco. 

9E 

The full length B.t.k. HD-73 genes described in 
Example 3 were also incorporated into the plant 
transformation vector pMON893 so that they were 
expressed from the En 35S promoter. The 

synthetic/wild-type full length HD-73 gene of Figure 9 
30 was incorporated into pMON893 to create pMON10505. 
The synthetic/modified full length HD-73 gene of 
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Figure 10 was incorporated into pMON893 to create 

pMONl0526. The fully synthetic HD-73 gene of Figure 

11 was incorporated into pMON893 to create pMON10518. 

5 

These vectors were used to obtain transformed tobacco 
plants, and the plants were analyzed for insecticidal 
efficacy and for B.t.k. HD-73 protein levels by 
Western blot or ELISA immunoassay. 

Tobacco plants containing all three of these full 
10 length B.t.k. genes produced detectable B.t.k. protein 
and showed 100% mortality of tobacco hornworm. This 
result is surprising in light of previous reported 
attempts to express the full length B.t.k. genes in 
transgenic plants. Vaeck et al. (1987) reported that 
25 a full length B.t.k. berliner gene similar to our HD-1 
gene could not be detectably expressed in tobacco. 
Barton et al. (1987) reported a similar result for 
another full length gene from B.t.k. HD-1 (the so 
called 4.5 kb gene), and further indicated that 
tobacco callus containing this gene became necrotic, 

20 

indicating that the full length gene product was toxic 
to plant cells. Fischhoff et al. (1987) reported that 
the full length B.t.k. HD-1 gene in tomato was poorly 
expressed compared to a truncated gene, and no plants 
that were fully toxic to tobacco hornworm could be 

os 

recovered. All three of the above reports indicated 
much higher expression levels and recovery of toxic 
plants if the respective B.t.k. genes were truncated. 
Adang et al. reported that the full length HD-73 gene 
yielded a few tobacco plants with some biological 
30 activity (none were highly toxic) against hornworm and 
barely detectable B.t.k. protein. It was also noted 
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by them that the major B.t.k. mRNA in these plants was 
a truncated 1.7 kb species that would not encode a 
functional toxin. This indicated improper expression 

5 

of the gene in tobacco. In contrast to all of these 
reports, the three full length B.t.k. HD-73 genes 
described above all lead to relatively high levels of 
protein and high levels of insect toxicity. 

B.t.k. protein and mRNA levels in tobacco plants 

10 are shown in Table X for these three vectors. As can 
be seen from the table, the synthetic/wild-type gene 
(pMON10506) produces B.t.k. protein as about 0.01% of 
total soluble protein; the synthetic/modified gene 
produces B.t.k. as about 0.02% of total soluble 

25 protein; and the fully synthetic gene produces B.t.k. 
as about 0.2% of total soluble protein. B.t.k. mRNA 
was analyzed in these plants by Northern blot analysis 
using the common 5* synthetic half of the genes as a 
probe. As shown in Table X, the increased protein 
levels can largely be attributed to increased mRNA • 

20 

levels. Compared to the truncated modified and 
synthetic genes, this could indicate that the major 
contributors to increased translational efficiency are 
in the 5' half of the gene while the 3 ’ half of the 
gene contains mostly determinants of mRNA stability. 

2® The increased protein levels also indicate that 
increasing the amount of the full length gene that is 
synthetic or modified increases B.t.k. protein levels. 
Compared to the truncated synthetic B.t.k. HD-73 genes 
(pMON5383 or pMON5390), the fully synthetic gene 
(pMON10518) produces as much or slightly more B.t.k. 
protein demonstrating that the full length genes are 
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cap able of being expressed at high levels in plants. 

These tobacco plants with high levels of full length 

HD-73 protein show no evidence of abnormality and are 

5 t 

fully fertile. The B.t.k. protein levels in these 
plants also produce the expected levels of insect 
toxicity based on feeding studies with beet arrayworm 
or diet incorporation assays of plant extracts with 
tobacco budworm. The B.t.k . protein detected by 
Western blot analysis in these tobacco plants often 
contains a varying amount of protein of about 80. kDa 
which is apparently a proteolytic fragment of the full 
length protein. The C-terminal half of the full 
length protein is known to be proteolytically 

15 sensitive, and similar proteolytic fragments are seen 
from the full length gene in E. coli and B.t. itself. 
These fragments are fully insecticidal. The Northern 
analysis indicated that essentially all of the mRNA 
from these full length genes was of the expected full 

2 q length size. There is no evidence of truncated mRNAs 
that could give rise to the 80 kDa protein fragment. 
In addition, it is possible that the fragment is not 
present in intact plant cells and is merely due to 
proteolysis during extraction for immunoassay. 

25 


30 
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Table X 

Full Length B.t.k. HD-73 Protein and 

mRNA Levels in Transgenic Tobacco Plants 
5 

Gene B.t.k. protein Relative B.t.k. 

description Vector concentration mRNA level 


Synthetic/wild type 

pMON10506 

>100 

0.5 

10 Synthetic/modified 

PMON10526 

400 

1 

Fully synthetic 

PMON10518 

>2000 

40 


Thus, there is no serious impediment to producing 

high levels of B.t.k. HD-73 protein in plants from 

synthetic genes, and this is expected to be true of 

15 

other full length lepidopteran active genes such as 
B.t.k. HD-1 or B.t. entomocidus . The fully synthetic 
B.t.k. HD-1 gene-of Example 3 has been assembled in 
plant transformation vectors such as pMON893. 

The fully synthetic gene in pMON10518 was also 
20 utilized in another plant vector and analyzed in 
tobacco plants. Although the CaMV35S promoter is 
generally a high level constitutive promoter in most 
plant tissues, the expression level of genes driven 
the CaMV35S promoter is low in floral tissue relative 
25 to the levels seen in leaf tissue. Because the 

economically important targets damaged by some insects 
are the floral parts or derived from floral parts 
(e.g., cotton squares and bolls, tobacco buds, tomato 
buds and fruit) , it may be advantageous to increase 
the expression of B.t. protein in these tissues over 
that obtained with the CaMV35S promoter. 


30 
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The 35S promoter of Figwort Mosaic Virus (FMV) is 
analogous to the CaMV35S promoter. This promoter has 
been isolated and engineered into a plant 
® transformation vector analogous to pMON893. Relative 
to the CaMV promoter, the FMV 35S promoter is highly 
expressed in the floral tissue, while still providing, 
similar high levels of gene expression in other 
tissues such as leaf. A plant transformation vector, 
10 pMON10517, was constructed in which the full length 

synthetic B.t.k. HD-73 gene of Figure 11 was driven by 
the FMV 35S promoter. This vector is identical to 
pMON10518 of Example 3 except that the FMV promoter is 
substituted for the CaMV promoter. Tobacco plants 
15 transformed with pMON10517 and pMON10518 were obtained 
and compared for expression of the B.t.k. protein by 
Western blot or ELISA immunoassay in leaf and floral 
tissue. This analysis showed that pMON10517 
containing the FMV promoter expressed the full length 
HD-73 protein at higher levels in floral tissue than 
pMON10518 containing the CaMV promoter. Expression of 
the full length B.t.k. HD-73 protein from pMON10517 in 
leaf tissue is comparable to that seen with the most 
highly expressing plants containing pMON10518. 
However, when floral tissue was analyzed, tobacco 

OK 

plants containing pMON10518. that had high levels of 
B.t.k. protein in leaf tissue did not have detectable 
B.t.k. protein in the flowers. On the other hand, 
flowers of tobacco plants containing pMON10517 had 
levels of B.t.k. protein nearly as high as the levels 
30 in leaves at approximately 0.05% of total soluble 
protein. This analysis showed that the FMV promoter 
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could be used to produce relatively high levels of 

B.t.k. protein in floral tissue compared to the CaMV 

promoter. 

5 

b) Tomato. 

The wild-type, modified and synthetic B.t.k. HD-1 
genes tested in tobacco were introduced into other 
plants to demonstrate the broad utility of this 
I® invention. Transgenic tomatoes were produced which 
contain these three genes. Data show that the 
increased expression observed with the modified and 
synthetic gene in tobacco also extends to tomato. 
Whereas the B.t.k. HD-1 protein is only barely 
15 detectable in plants containing the wild type HD-1 
gene (pMON9921), B.t.k. HD-1 was readily detected and 
the levels determined for plants containing the 
modified (pMON5370) or synthetic (pMON5377) genes. 
Expression levels for the plants containing the wild- 
20 type, modified and synthetic HD-1 genes were 
approximately 10, 100 and 500 ng per mg of total plant 
extract see Table XI below). The increase in B.t.k. 
HD-1 protein for the modified gene accounted for the 
majority of increase observed; 10 fold higher than the 
plants containing the wild-type gene, compared to only 
an additional five-fold increase for plants containing 
the synthetic gene. Again the site-directed changes 
made in the modified gene are the major contributors 
to the increased expression of B.t.k. HD-1. 
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Table XI 


B.t^k. HD-1 Expression in 
Transgenic Tomato Plants 


Gene 

Description 

30 

Wild type 

Modified 

Synthetic 


B.t.k. Protein* 
Vector Concentration 


PMON9921 10 
PMON5370 100 
PMON5377 500 


Fold Increase 
in B.t.k. 
Expression 

1 

10 

50 


„ * B.t.k. HD-1 protein concentrations are expressed in 

lu 

ng/mg of total soluble plant protein. Data for plants 
containing the wild-type gene are estimates from mRNA 
levels and protein levels determined by ELISA. 

These differences in B.t.k. HD-1 expression were 
M confirmed with bioassays against tobacco hornworm and 
beet armyworm. Leaves from tomato plants containing 
each of these genes controlled tobacco hornworm damage 
and produced 100% mortality. With beet armyworm, 
leaves from plants containing the wild-type HD-1 gene 
25 (pMON9921) showed significant damage, leaves from 

plants containing the modified gene (pMON5370) showed 
less damage and leaves from plants containing the 
synthetic gene (pM0N5377) were completely protected 
(see Table XII below). 
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Table XII 


Protection of To mato Plan ts from 
Tobacco Hornworm and Beet Armyworm 


Gene 

Descriotion 

Tobacco 

Vector Damacre* 

Hornworm 

Beet Armyworm 

Damacre* 

None 

None 

NL 

NL 

Wild type 

PMON9921 

0 

3 

Modified 

pMON5370 

0 

1 

Synthetic 

pMON5377 

0 

0 


jg * Damage was rated as shown in Table IX. 

The generality of the synthetic gene approach was 
extended in tomato with a synthetic B.t.k. HD-73 gene. 

In tomato, extracts from plants containing the wild- 
type truncated HD-73 gene (pMON5367) showed no 

20 

detectable HD-73 protein. Extracts from plants 
containing the synthetic HD-73 gene (pMON5383) showed 
high levels of B.t.k. HD-73 protein, approximately 
2000 ng per mg of plant extract protein. These data 
clearly demonstrate that the changes made in the 
25 synthetic HD-73 gene lead to dramatic increases in the 
expression of the HD-73 protein in tomato as well as 
in tobacco 

In contrast to tobacco, the synthetic HD-73 gene in 
tomato is expressed at approximately 4-fold to 5-fold 
higher levels than the synthetic HD-1 gene. Because 
the HD-73 protein is about 5-fold more active than the 
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HD-1 protein against many insect pests including 

Heliothis species, the increased expression of 

synthetic HD-73 compared to synthetic HD-1 corresponds 

5 

to about a 25-fold increased insecticidal efficacy in 
tomato. 

In order to determine the mechanisms involved in 
the increased expression of modified and synthetic 
B.t.k. HD—1 genes in tomato, SI nuclease analysis of 
mRNA levels from transformed tomato plants was 
performed. As indicated above, a similar analysis had 
been performed with tobacco plants, and this analysis 
showed that the modified gene produced up to 10-fold 
more mRNA than the wild-type gene. The analysis in 
15 tomato utilized a different DNA probe that allowed the 
analysis of wild-type (pMON9921), modified (pMON5370) 
and synthetic (pMON5377) HD-1 genes with the same 
probe. This probe was derived from the 5' 
untranslated region of the CaMV35S promoter in pMON893 
2 q that was common to all three of these vectors 
(PMON9921, pMON5370 and pMON5377). This SI analysis 
indicated that B.t.k. mRNA levels from the modified 
gene were 3 to 5 fold higher than for the wild-type 
gene, and that mRNA levels for the synthetic gene were 
about 2 to 3 fold higher than for the modified gene. 
Three independent transformants were analyzed for each 
gene. Compared to the fold increases in B.t.k. HD-1 
protein from these genes in tomato shown in Table XI, 
these mRNA increases can explain about half of the 
total protein increase as was seen in tobacco for the 
wild-type and modified genes. For tomato the total 
mRNA increase from wild-type to synthetic is about 6 


30 



WO 90/10076 


PCT/US90/00778 


-in¬ 


to 15 fold compared to a protein increase of about 50 

fold. This result is similar to that seen for tobacco 

in comparing the wild-type and modified genes, and it 

5 

extends to the synthetic gene as well. That is, about 
half of the total fold increase in B.t.k. protein from 
wild-type to modified genes can be explained by mRNA 
increases and about half to enhanced translational 
efficiency. The same is also true in comparing the 
modified gene to the synthetic gene. Although there 
is an additional increase in RNA levels, this mRNA 
increase can explain only about half of the total 
protein increase. 

The full length B.t.k . genes described above were 
15 also used to transform tomato plants and these plants 
were analyzed for B.t.k. protein and insecticidal 
efficacy. The results of this analysis are shown in 
Table XIII. Plants containing the synthetic/wild-type 
gene (pMON10506) produce the B.t.k. HD-73 protein at 
2 q levels of about 0.01% of their total soluble protein. 
Plants containing the synthetic/modified gene 
(pMON10526) produce about 0.04% B.t.k. protein, and 
plants containing the fully synthetic gene (pMON10518) 
produce about 0.2% B.t.k. protein. These results are 
very similar to the tobacco plant results for the same 
genes. mRNA levels estimated by Northern blot 
analysis in tomato also increase in parallel with the 
protein level increase. As for tobacco with these 
three genes, most of the protein increase can be 
attributed to increased mRNA with a small component of 
translational efficiency increase indicated for the 
fully synthetic gene. The highest levels of full 
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length B.t.k. protein (from pMON10518) are comparable 
to or just slightly lower than the highest levels 
observed for the truncated HD-73 genes (pMON5383 and 
pMON5390). Tomato plants expressing these full length 
genes have the insecticidal activity expected for the 
observed protein levels as determined by feeding 
assays with beet armyworm or by diet incorporation of 
plant extracts with tobacco hornworm. 


Table XIII 


15 


20 


Full Length B.t.k. HD-73 Protein and 
mRNA Levels in Transgenic Tomat o Plants 


Gene B.t.k. protein Relative B.t.k. 

description Vector concentration mRNA level 


Synthetic/wild type 

PMON10506 

100 

1 

Synthetic/modified 

PMON10526 

400 

2-4 

Fully synthetic 

PMON10518 

2000 

10 


c) Cotton. 

The generality of the increased expression of 
25 B.t.k. HD-1 and B.t.k. HD-73 by use of the modified 
and synthetic genes was extended to cotton. 
Transgenic calli were produced which contain the wild 
type (pMON9921) and the synthetic HD-1 (pMON5377) 
genes. Here again the B.t.k. HD-1 protein produced 
from calli containing the wild-type gene was not 
detected, whereas calli containing the synthetic HD-1 
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gene expressed the HD-1 protein at easily detectable 

levels. The HD-1 protein was produced at 

approximately 1000 ng/mg of plant call! extract 

5 

protein. Again, to ensure that the protein produced by 
the transgenic cotton calli was biologically active 
and that the increased expression observed with the 
synthetic gene translated to increased biological 
activity, extracts of cotton calli were made in 
similar manner as described for tobacco plants, except 
that the calli was first dried between Whatman filter 
paper to remove as much of the water as possible. The 
dried calli were then ground in liquid nitrogen and 
ground in 100 mM sodium carbonate buffer, pH 10. 

15 Approximately 0.5 ml aliquotes of this material was 
applied to tomato leaves with a paint brush. After 
the leaf dried, five tobacco hornworm larvae were 
applied to each of two leaf samples. Leaves painted 
with extract from control calli were completely 

2 q destroyed. Leaves painted with extract from calli 
containing the wild-type HD-1 gene (pMON9921) showed 
severe damage. Leaves painted with extract from calli 
containing the synthetic HD-1 gene (pMON5377) showed 
no damage (see Table XIV below). 

25 
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Table XIV 



Control Control NL 

Wild type HD-1 pMON9921 3 

Synthetic HD-1 pMON5377 0 

15 Synthetic HD-73 pMON5383 0 

* Damage was rated as shown in Table VIII. 

20 

25 

30 


Cotton calli were also produced containing another 
synthetic gene, a gene encoding B.t.k. HD-73. The 
preparation of this gene is described in Example 3. 
Calli containing the synthetic HD—73 gene produced the 
corresponding HD-73 protein at even higher levels than 
the calli which contained the synthetic HD-1 gene. 
Extracts made from calli containing the HD-73 
synthetic gene (pMON5383) showed complete control of 
tobacco hornworm when painted onto tomato leaves as 
described above for extracts containing the HD-1 
protein. (See Table XIV). 

Transgenic cotton plants containing the synthetic 
B.t.k. HD-1 gene (pMON5377) or the synthetic B.t.k. HD- 
73 gene (pMON5383) have also been examined. These 
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plants produce the HD-1 or HD-73 proteins at levels 

comparable to that seen in cotton callus with the same 

genes and comparable to tomato and tobacco plants with 

5 

these genes. For either synthetic truncated HD-1 or 
HD-73 genes, cotton plants expressing B.t.k. protein 
at 1000 to 2000 ng/mg total protein (0.1% to 0.2%) 
were recovered at a high frequency. Insect feeding 
assays were performed with leaves from cotton plants 
expressing the synthetic HD-1 or HD-73 genes. These 
leaves showed no damage (rating of 0) when challenged 
with larvae of cabbage looper (Trichoplusia ni), and 
only slight damage when challenged with larvae of beet 
armyworm (Spodoptera exigua). Damage ratings are as 

15 defined in Table VIII above. This demonstrated that 
cotton plants as well as call! expressed the.synthetic 
HD-1 or HD-73 genes at high levels and that those 
plants were protected from damage by Lepidopteran 
insect larvae. 

2 q Transgenic cotton plants containing either the 

synthetic truncated HD-1 gene (pMON5377) or the 
synthetic truncated HD-73 gene (pMON5383) were also 
assessed for protection against cotton bollworm at the 
whole plant level in the greenhouse. This is a more 
realistic test of the ability of these plants to 
produce an agriculturally acceptable level of control. 
The cotton bollworm (Heliothis zea) is a major pest of 
cotton that produces economic damage by destroying 
terminals, squares and bolls, and protection of these 
fruiting bodies as well as the leaf tissue will be 

30 important for effective insect control and adequate 
crop protection. To test the protection afforded to 
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whole plants, R1 progeny of cotton plants expressing 

high levels of either B.t.k. HD-1 (pMQN5377) or B.t.k. 

HD-73 (pMON5383) were assayed by applying 10-15 eggs 

5 

of cotton bollworm per boll or square to the 20 

uppermost squares or bolls on each plant. At least 12 

plants were analyzed per treatment. The hatch rate of 

the eggs was approximately 70%. This corresponds to 

very high insect pressure compared to numbers of 

^ larvae per plant seen under typical field conditions. 

Under these conditions 100% of the bolls on control 

cotton plants were destroyed by insect damage. For 

the transgenics, significant boll protection was 

observed. Plants containing pM0N5377 (HD-1) had 70- 

15 75% of the bolls survive the intense pressure of this 

assay. Plants containing pM0N5383 (HD-73) had 80% to 

90% boll protection. This is likely to be a 

consequence of the higher activity of HD-73 protein 

against cotton bollworm compared to HD-1 protein. In 

2 q cases where the transgenic plants were damaged by the 

insects, the surviving larvae were delayed in their 

development by at least one instar. 

Therefore, the increased expression obtained with 

the modified and synthetic genes is not limited to any 

one crop; tobacco, tomato and cotton calli and cotton 

25 

plants all showed drastic increases in B.t.k. 
expression when the plants/calli were produced 
containing the modified or synthetic genes. Likewise, 
the utility of changes made to produce the modified 
and synthetic B.t.k. HD-1 gene is not limited to the 
30 HD-1 gene. The synthetic HD-73 gene in all three 
species also showed drastic increases in expression. 
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In summary, it has been demonstrated that: (1) the 
genetic changes made in the HD-1 modified gene lead to 

very significant increases in B.t.k. HD-1 expression; 

5 

(2) production of a totally synthetic gene lead to a 
further five-fold increase in B.t.k. HD-1 expression; 

(3) the changes incorporated into the modified HD-1 
gene accounted for the majority of the increased 
B.t.k. expression observed with the synthetic gene; 

I® (4) the increased expression was demonstrated in three 
different plants — tobacco plants, tomato plants and 
cotton calli and cotton plants; (5) the increased 
expression as observed by Western analysis also 
correlated with similar increases in bioactivity, 

15 showing that the B.t.k. HD—1 proteins produced were 
comparably active; (6) when the method of the present 
invention used to design the synthetic HD-1 gene was 
employed to design a synthetic HD-73 gene it also was 
expressed at much higher levels in tobacco, tomato and 

2 q cotton than the wild-type equivalent gene with 
consequent increases in bioactivity; (7) a fully 
synthetic full length B.t.k. gene was expressed at 
levels comparable to synthetic truncated genes. 

2 g Example 5 — Synthetic B.t. _ tenebrio nis Gene in 

Tobacco. Tomato and Potato 

Referring to Figure 12, a synthetic gene encoding a 
Coleopteran active toxin is prepared by making the 
indicated changes in the wild-type gene of B.t. 
tenebrionis or de novo synthesis of the synthetic 
structural gene. The synthetic gene is inserted into 
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an intermediate plant transformation vector such as 

pMON893: Plasmid pMON893 containing the synthetic 

B.t.t. gene is then inserted into a suitable disarmed 
5 

Agrobacterium strain such as A. tu mefaclens ACO. 

Transformation and Regeneration of Potato 

Sterile shoot cultures of Russet Burbank are 
10 maintained in vials containing 10 ml of PM medium 
(Murashige and Skoog (MS) inorganic salts, 30 g/1 
surcose, 0.17 g/1 NaH 2 P0 4 H 2 0, 0.4 mg/1 thiamine-HCl, 

and 100 mg/1 myo-inositol, solidified with 1 g/1 

Gelrite at pH 6.0). When shoots reached approximately 

15 5 cm in length, stem internode segments of 7-10 mm are 

excised and smeared at the. cut ends with a disarmed 

Agrobacterium tumefaclens vector containing the 

synthetic B.t.t. gene from a four day old plate 

culture. The stem explants are co-cultured for three 

20 days at 23°c on a sterile filter paper placed over 1.5 

ml of a tobacco cell feeder layer overlaid on 1/10 P 

medium (1/10 strength MS inorganic salts and organic 

addenda without casein as in Jarret et al. (1980), 30 

g/1 surcose and 8.0 g/1 agar). Following co-culture 

the explants are transferred to full strength P-1 
25 

medium for callus induction, composed of MS inorganic 
salts, organic additions as in Jarret et al. .(1980) 
with the exception of casein, 3.0 mg/1 benzyladenine 
(BA), and 0.01 mg/1 naphthaleneacetic acid (NAA) 
(Jarret, et al., 1980). Carbenicillin (500 mg/1) is 
included to inhibit bacterial growth, and 100 mg/1 
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kanamycin is added to select for transformed cells. 

After four weeks the explants are transferred to 

medium of the same composition but with 0.3 mg/1 

5 

gibberellic acid (GA3) replacing the BA and NAA 
(Jarret et al., 1981) to promote shoot formation. 

Shoots begin to develop approximately two weeks after 
transfer to shoot induction medium; these are excised 
and transferred to vials of PM medium for rooting. 
Shoots are tested for kanamycin resistance conferred 
by the enzyme neomycin phosphotransferase II, by 
placing a section of the stem onto callus induction 
medium containing MS organic and inorganic salts, 30 
g/1 surcrose, 2.25 mg/1 BA, 0.186 mg/1 NAA, 10 mg/1 
15 GA3 (Webb, et al., 1983) and 200 mg/1 kanamycin to • 
select for transformed cells. 

The synthetic B.t.t. gene described in figure 12, 
was placed into a plant expression vector as descibed 
in example 5. The plasmid has the following 
2 q characteristics; a synthetic Bglll fragment having 
approximately 1800 base pairs was inserted into 
pMON8 93 in such a manner that the enhanced 35S 
promoter would express the B.t.t. gene. This 
construct, pMON1982, was used to transform both 
tobacco and tomato. Tobacco plants, selected as 
kanamycin resistant plants were screened with rabbit 
anti-B.t.t. antibody. Cross-reactive material was 
detected at levels predicted to be suitable to cause 
mortality to CPB. These target insects will not feed 
on tobacco, but the transgenic tobacco plants do 
demonstrate that the synthetic gene does improve 
expression of this protein to detectable levels. 


30 



WO 90/10076 


PCT/US90/00778 


- 86 - 


Tomato plants with the pMON1982 construct were 
determined to produce B.t.t. protein at levels 
insecticidal to CPB. In initial studies/ the leaves 
of four plants (5190/ 5225/ 5328 and 5133) showed 

little or no damage when exposed to CPB larvae (damage 
rating of 0-1 on a scale of 0 to 4 with 4 as no leaf 
remaining). Under these conditions the control leaves 
were completely eaten. Immunological analysis of 
these plants confirmed the presence of material cross¬ 
reactive with anti-B.t.t. antibody. Levels of protein 
expression in these plants were estimated at 
aproximately 1 to 5 ng of B.t.t. protein in 50 ug of 
total extractable protein. A total of 17 tomato 
15 plants (17 of 65 tested) have been identified which 
demonstrate protection of leaf tissue from CPB (rating 
of 0 or 1) and show good insect mortality. 

Results similar to those seen in tobacco and tomato 

with pMON1982 were seen with pMON1984 in the same 

2 q plant species. pMON1984 is identical to pMON1982 

except that the synthetic protease inhibitor (CMTI) is 

fused upstream of the native proteolytic cleavage 

site. Levels of expression in tobacco were estimated 

to be similar to pMON1982, between 10-15 ng per 50ug 

of total soluble protein. 

25 

Tomato plants expressing pMON1984 have been 
identified which protect the leaves from ingestion by 
CPB. The damage rating was 0 with 100% insect 
mortality. 

Potato was transformed as described in example 5 
30 with a vector similar to pMON1982 containing the 
enhanced CaMV35S/synthetic B.t.t. gene. Leaves of 
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potato plants transformed with this vector, were 
screened by CPB insect bioassay. Of the 35 plants 
tested, leaves from 4 plants, 16a, 13c, 13d, and 23a 
were totally protected when challenged. Insect 
bioassays with leaves from three other plants, 13e, 
la, and 13b, recorded damage levels of 1 on a scale of 
0 to 4 with 4 being total devestation of the leaf 
material. Immunological analysis confirmed the 
I® presence of B.t.t. cross-reactive material in the leaf 
tissue. The level of B.t.t. protein in leaf tissue of 
plant 16a (damage rating of 0) was estimated at 20-50 
ng of B.t.t. protein/50 ug of total soluble protein. 
The levels of B.t.t. protein seen in 16a tissue was 
15 consistent with its biological activity. 
Immunological analysis of 13e and 13b (tissue which 
scored 1 in damage rating) reveal less protein (5-10 
ng/50 ug of total soluble protein) than in plant 16a. 
Cuttings of plant 16a were challenged with 50 to 200 
20 eggs of CPB in a whole plant assay. Under these 
conditions 16a showed no damage and 100% mortality of 
insects while control potato plants were heavily 
damaged. 

__ Example 6 — Synthetic B.t.k. P2 Protein rrhp 

The P2 protein is a distinct insecticidal protein 
produced by some strains of B.t. including B.t.k. HD- 
1. It is characterized by its activity against both 
lepidopteran and dipteran insects (Yamamoto and 
30 Iizuka, 1983) . Genes encoding the P2 protein have 
been isolated and characterized (Donovan et al., 
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1988). The P2 proteins encoded by these genes are 
. approximately 600 amino acids in length. These 
proteins share only limited homology with the 
lepidopteran specific PI type proteins, such as the 
B.t.k. HD-1 and HD-73 proteins described in previous 
examples. 

The P2 proteins have substantial activity against a 
variety of lepidopteran larvae including cabbage 
10 looper, tobacco hornworm and tobacco budworm. Because 
they are active against agronomically important insect 
pests, the P2 proteins are a desirable candidate in 
the production of insect tolerant transgenic plants 
either alone or in combination with the other B.t. 
15 toxins described in the above examples. In some 
plants, expression of the P2 protein alone might be 
sufficient to provide protection against damaging 
insects. In addition, the P2 proteins might provide 
protection against agronomically important dipteran 
2 q pests. In other cases, expression of P2 together with 
the B.t.k. HD-1 or HD-73 protein might be preferred. 
The P2 proteins should provide at least an additive 
level of insecticidal activity when combined with the 
crystal protein toxin of B.t.k. HD-1 or HD-73, and the 
combination may even provide a synergistic activity. 
Although the mode of action of the P2 protein is 
unknown, its distinct amino acid sequence suggests 
that it functions differently from the B.t.k. HD-1 and 
HD-73 type of proteins. Production of two insect 
tolerance proteins with different modes of action in 
30 the same plant would minimize the potential for 
development of insect resistance to B.t. proteins in 
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plants . The lack of substantial DNA homology between 

P2 genes and the HD-1 and HD—73 genes minimizes the 

potential for recombination between multiple insect 

5 

tolerance genes in the plant chromosome. 

The genes encoding the P2 protein although distinct 
in sequence from the B.t.k. HD-1 and HD-73 genes share 
many common features with these genes. In particular, 
the P2 protein genes have a high A+T content (65%), 
multiple potential polyadenylation signal sequences 
(26) and numerous ATTTA sequences (10). Because of 
its overall similarity to the poorly expressed wild- 
type B.t.k. HD-1 and HD-73 genes, the same problems 
are expected in expression of the wild-type P2 gene as 
15 were encountered with the previous examples. Based on 
the above-described method for designing the synthetic 
B.t. genes, a synthetic P2 gene has been designed 
which gene should be expressed at adequate levels for 
protection in plants. A comparision of the wild-type 
2 q and synthetic P2 genes is shown in Figure 13. 

Example 7 — Synthetic B.t. En tomocidus Gene 

The B.t. entomocidus ("Btent") protein is a 
distinct insecticidal protein produced by some strains 
of B.t. bacteria. It is characterized by its high 
level of activity against some lepidopterans that are 
relatively insensitive to B.t.k. HD-1 and HD-73 such 
as Spodoptera species including beet armyworm (Visser 
et al., 1988). Genes encoding the Btent protein have 
been isolated and characterized (Honee et al, 1988) . 
The Btent proteins encoded by these genes are 
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approximately the same length as B.t.k. HD-1 and HD- 

73. These proteins share only 68% amino acid homology 

with the B.t.k. HD-1 and HD-73 proteins. It is likely 
5 

that only the N-terminal half of the Btent protein is 
required for insecticidal activity as is the case for 
HD-1 and HD-73. Over the first 625 amino acids, Btent 
shares only 38% amino acid homology with HD-1 and HD- 
73. 

Because of their higher activity against Spodoptera 

species that are relatively insensitive to HD-1 and HD- 

73, the Btent proteins are a desirable candidate for 

the production of insect tolerant transgenic plants 

either alone or in combination with the other B.t. 

15 toxins described in the above examples. In some plants 

production of Btent alone might be sufficient to 

control the agronomically important pests. In other 

plants, the production of two distinct insect 

tolerance proteins would provide protection against a 

2 q wider array of insects. Against those insects where 

both proteins are active, the combination of the 

B.t.k. HD-1 or HD-73 type protein plus the Btent 

protein should provide at least additive insecticidal 

efficacy, and may even provide a synergistic activity. 

In addition, because of its distinct amino acid 
25 

sequence, the Btent protein may have a different mode 
of action than HD-1 or HD-73. Production of two 
insecticidal proteins in the same plant with different 
modes of action would minimize the potential for 
development of insect resistance to B.t. proteins in 
30 plants. The relative lack of DNA sequence homology 
with the B.t.k. type genes minimizes the potential for 
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recombination between multiple insect tolerance genes 
in the plant chromosome. 

The genes encoding the Btent protein although 

5 

distinct m sequence from the B.t.k. HD-1 and HD-73 
genes share many common features with these genes. In 
particular, the Btent protein genes have a high A+T 
content (62%), multiple potential polyadenylation 
signal sequences (39 in the full length coding 
*0 sequence and 27 in the first 1875 nucleotides that is 
likely to encode the active toxic fragment) and 
numerous ATTTA sequences (16 in the full length coding 
sequence and 12 in the first 1875 nucleotides) . 
Because of its overall similarity to the poorly 
16 expressed wild type B.t.k. HD-1 and HD-73 genes, the 
wild-type Btent genes are expected to exhibit similar 
problems in expression as were encountered with the 
wild-type HD-1 and HD-73 genes. Based on the above- 
described method used for designing the other 
2 q synthetic B.t. genes, a synthetic Btent gene has been 
designed which gene should be expressed at adequate 
levels for protection in plants. A comparision of the 
wild type and synthetic Btent genes is shown in 
Figure 14. 

Example—8.. Synthetic B.t.k. Gene s for Expression 

in Corn 

High level expression of heterologous genes in corn 
cells has been shown to be enhanced by the presence of 
a corn gene intron (Callis et al., 1987). Typically 

these introns have been located in the 5' untranslated 
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region of the chimeric gene. It has been shown that 

the CaMV35S promoter and the NOS 3' end function 

efficiently in the expression of heterologous genes in 
5 

corn cells (Fromm et al., 1986). 

Referring to Figure 15/ a plant expression cassette 
vector (pMON744) was constructed that contains these 
sequences. Specifically the expression cassette 
contains the enhanced CaMV 35S promoter followed by 
intron 1 of the corn Adhl gene (Callis et al., 1987). 
This is followed by a multilinker cloning site for 
insertion of coding sequences; this multilinker 
contains a Bglll site among others. Following the 
multilinker is the NOS 3' end. pMON744 also contains 
15 the selectable marker gene 35S/NPTII/NOS 3' for 
kanamycin selection of transgenic corn cells. In 
addition, pMON744 has an E. cold, origin of replication 
and an ampicillin resistance gene for selection of the 
plasmid in E. coll. 

2 q Five B.t.k. coding sequences described in the 

previous examples were inserted into the Bglll site of 
pMON744 for corn cell expression of B.t.k. The coding 
sequences inserted and resulting vectors were: 

1. Wild type B.t.k. HD-1 from pMON9921 to make 
25 

PMON8652. 

2. Modified B.t.k. HD-1 from pMON5370 to make 
pMON8642. 

3. Synthetic B.t.k. HD-1 from pMON5377 to make 
PMON8643. 

30 4. Synthetic B.t.k. HD-73 from pMON5390 to make 

PMON8644. 
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5. Synthetic full length B.t.k. HD-73 from 
PMON10518 to make pMON10902. 

5 

pMON8652 (wild-type B.t.k. HD-1) was used to 
transform corn cell protoplasts and stably transformed 
kanamycin resistant callus was isolated. B.t.k. mRNA 
in the corn cells was analyzed by nuclease SI 
protection and found to be present at a level 
comparable to that seen with the same wild-type coding 
sequence (pMON9921) in transgenic tomato plants. 

pMON8652 and pMON8642 (modified HD-1) were used to 
transform corn cell protoplasts in a transient 
expression system. The level of B.t.k. mRNA was 
15 analyzed by nuclease SI protection. The modified HD-1 
gave rise to a several fold increase in B.t.k. mRNA 
compared to the wild-type coding sequence in the 
transiently transformed corn cells. This indicated 
that the modifications introduced into the B.t.k. HD-1 
2 q gene are capable of enhancing B.t.k. expression in 
monocot cells as was demonstrated for dicot plants and 
cells. 

pMON8642 (modified HD-1) and pMON8643 (synthetic 
HD-1) were used to transform Black Mexican Sweet (BMS) 
corn cell protoplasts by PEG-mediated DNA uptake, and 
stably transformed corn callus was selected by growth 
on kanamycin containing plant growth medium. 
Individual callus colonies that were derived from 
single transformed cells were isolated and propagated 
separately on kanamycin containing medium. 

To assess the expression of the B.t.k. genes in 
these cells, callus samples were tested for insect 
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toxicity by bioassay against tobacco hornworm larvae. 

For each vector, 96 callus lines were tested by 

bioassay. Portions of each callus were placed on 
5 

sterile water agar plates, and five neonate tobacco 
hornworm larvae were added and allowed to feed for 4 
days. For pMON8643, 100% of the larvae died after 
feeding on 15 of the 96 calli and these calli showed 
little feeding damage. For pMON8642, only 1 of the 96 
calli was toxic to the larvae. This showed that the 
B.t.k. gene was being expressed in these samples at 
insecticidal levels. The observation that 

significantly more calli containing pMON8643 were 
toxic than for pMON8642 showed that significantly 
15 higher levels of expression were obtained when the 
synthetic HD-1 coding sequence was contained in corn 
cells than when the modified HD—1 coding sequence was 
used, similar to the previous examples with dicot 
plants. A semiquantitative immunoassay showed that 
2 q the pMON8643 toxic samples had significantly higher 
B.t.k. protein levels than the pMON8642 toxic sample. 

The 16 callus samples that were toxic to tobacco 
hornworm were also tested for activity against 
European corn borer. European corn borer is 
approximately 40-fold less sensitive to the HD-1 gene 
product than is tobacco hornworm. Larvae of European 
corn borer were applied to the callus samples and 
allowed to feed for 4 days. Two of the 16 calli 
tested, both of which contained pMON8643 (synthetic HD- 
1), were toxic to European corn borer larvae. 
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To assess the expression of the B.t.k. genes in 

differentiated corn tissue, another method of DNA 

delivery was used. Young leaves were excised from 
5 

corn plants, and DNA samples were delivered into the 
leaf tissue by microprojectile bombardment. In this 
system, the DNA on the microprojectiles is transiently 
expressed in the leaf cells after bombardment.. Three 
DNA samples were used, and each DNA was tested in 
^ triplicate. 

1. pMON744, the corn expression vector with no 
B.t.k. gene. 

2. pMON8643 (synthetic HD-1). 

15 3. pMON752, a corn expression vector for the GUS 

gene, no B.t.k. gene. 

The leaves were incubated at room temperature for 

24 hours. The pMON752 samples were stained with a 

2 q substrate that allows visual detection of the GUS gene 

product. This analysis showed that over one hundred 

spots in each sample were expressing the GUS product 

and the the triplicate samples showed very similar 

levels of GUS expression. For the pMON744 and 

pMON8643 samples 5 larvae of tobacco hornworm were 
25 

added to each leaf and allowed to feed for 48 hours. 
All three samples bombarded with pMON744 showed 
extensive feeding damage and no larval mortality. All 
three samples bombarded with pMON8643 showed no 
evidence of feeding damage and 100% larval mortality. 
The samples were also assayed for the presence of 
B.t.k. protein by a qualitative immunoassay. All of 
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the pMON8643 samples had detectable B.t.k. protein. 

These results demonstrated that the the synthetic 

B.t.k. gene was expressed in differentiated corn plant 

5 

tissue at insecticidal levels. 

Example._9 — Synthetic Potato Leaf Roll Vi rus Coat 
Protein Gene 

Expression in plants of the coat protein genes from 

a variety of plant viruses has proven to be an 

effective method of engineering resistance to these 

viruses. In order to achieve virus resistance, it is 

important to express the viral coat protein at an 

15 effective level. For many plant virus coat protein 

genes, this has not proved to be a problem. However, 

for the coat protein gene from potato leaf roll virus 

(PLRV) , expression of the coat protein has been 

observed to be low relative to other coat protein 

2 q genes, and this lower level of protein has not led to 

optimal resistance to PLRV. 

The gene for PLRV coat protein is shown in Figure 

16. Referring to Figure 16, the upper line of 

sequence shows the gene as it was originally 

engineered for plant expression in vector pMON893. 

25 

The gene was contained on a 749 nucleotide Bglll-EcoRI 
fragment with the coding sequence contained between 
nucleotides 20 and 643. This fragment also contained 
19 nucleotides of S' noncoding sequence and 104 
nucleotides of 3' noncoding sequence. This PLRV coat 
protein gene was relatively poorly expressed in plants 
compared to other viral coat protein genes. 
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A synthetic gene was designed to improve plant 
expression of the PLRV coat protein. Referring again 
to Figure 16, the changes made in the synthetic PLRV 
^ gene are shown in the lower line. This gene was 
designed to encode exactly the same protein as the 
naturally occurring gene. Note that the beginning of 
the synthetic gene is at nucleotide 14 and the end of 
the sequence is at nucleotide 654. The coding 
sequence for the synthetic gene is from nucleotide 20 
to 643 of the figure. The changes indicated just 
upstream and downstream of these endpoints serve only 
to introduce convenient restriction sites just outside 
the coding sequence. Thus the size of the synthetic 
15 gene is 641 nucleotides which is smaller than. the 
naturally occurring gene. The synthetic gene is 
smaller because substantially all of the noncoding 
sequence at both the 5' and 3’ ends, except for 
segments encoding the BglXI and EcoRI restriction 
2 q sites has been removed. 

The synthetic gene differs from the naturally 

occurring gene in two main respects. First, 41 

individual codons within the coding sequence have been 

changed to remove nearly all codons for a given amino 

acid that constitute less than about 15% of the codons 
25 

for that amino acid in a survey of dicot plant genes. 
Second, the 5' and 3’ noncoding sequences of the 
original gene have been removed. Although not 
strictly conforming to the algorithm described in 
Figure 1, a few of the codon changes and especially 
30 the removal of the long 3’ noncoding region is 
consistent with this algorithm. 
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The original PLRV sequence contains two potential 

plant polyadenylation signals (AACCAA and AAGCAT) and 

both of the these occur in the 3' noncoding sequence 

5 

that has been removed in the synthetic gene. The 
original PLRV gene also contains on ATTTA sequence. 
This is also contained in the 3' noncoding sequence, 
and is in the midst of the longest stretch of 
uninterrupted A+T in the gene (a stretch of 7 A+T 
nucleotides). This sequence was removed in the 
synthetic gene. Thus, sequences that the algorithm of 
Figure 1 targets for change have been changed in the 
synthetic PLRV coat protein gene by removal of the 3' 
noncoding segment. Within the coding sequence, codon 
15 changes were also made to remove three other regions 
of sequence described above. In particular, two 
regions of 5 consecutive A+T and one region of 5 
consecutive G+C within the coding sequence have been 
removed in the synthetic gene. 

2 q The synthetic PLRV coat protein gene is cloned in a 
plant transformation vector such as pMON893 and used 
to transform potato plants as described above. These 
plants express the PLRV coat protein at higher levels 
than achieved with the naturally occurring gene, and 
these plants exhibit increased resistance to infection 
^ by PLRV. 
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SaamPle.lQ — Expression of Synt hetic B.t. Genes 

With -RUBISCQ -Small Subunit Prom oters and 

Chloroplast Transit Peptides 
o 

The genes in plants encoding the small subunit of 
RUBISCO (SSB) are often highly expressed, light 
regulated and sometimes show tissue specificity. 
These expression properties are largely due to the 
promoter sequences of these genes. It has been 
possible to use SSU promoters to express heterologous 
genes in transformed plants. Typically a plant will 
contain multiple SSU genes, and the expression levels 
and tissue specificity of different SSU genes will be 

35 different. The SSU proteins are encoded in the 

nucleus and synthesized in the cytoplasm as precursors 
that contain an N-terminal extension known as the 
chloroplast transit peptide (CTP). The CTP directs 
the precursor to the chloroplast and promotes the 

2 q uptake of the SSU protein into the chloroplast. In 

this process, the CTP is cleaved from the SSU protein. 
These CTP sequences have been used to direct 
heterologous proteins into chloroplasts of transformed 
plants. 

The SSU promoters might have several advantages for 
expression of B.t.k. genes in plants. Some SSU 
promoters are very highly expressed and could give 
rise to expression levels as high or higher than those 
observed with the CaMV35S promoter. The tissue 
distribution of expression from SSU promoters is 
different from that of the CaMV35S promoter, so for 
control of some insect pests, it may be advantageous 
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to direct the expression of B.t.k. to those cells in. 

which SSU is most highly expressed. For example, 

although relatively constitutive, in the leaf the 
5 

CaMV35S promoter is more highly expressed in vascular 
tissue than in some other parts of the leaf, while 
most SSU promoters are most highly expressed in the 
mesophyll cells of the leaf. Some SSU promoters also 
are more highly tissue specific, so it could be 
possible to utilize a specific SSU promoter to express 
B.t.k. in only a subset of plant tissues, if for 
example B.t. expression in certain cells was found to 
be deleterious to those cells. For example, for 
control of Colorado potato beetle in potato, it may be 
15 advantageous to use SSU promoters to direct B.t.t. 
expression to the leaves but not to the edible tubers. 

Utilizing SSU CTP sequences to localize B.t. 
proteins to the chloroplast might also be 
2 g advantageous. Localization of the B.t. to the 
chloroplast could protect the protein from proteases 
found in the cytoplasm. This could stabilize the B.t. 
protein and lead to higher levels of accumulation of 
active protein. B.t. genes containing the CTP could 
be used in combination with the SSU promoter or with 
other promoters such as CaMV35S. 

A variety of plant transformation vectors were 
constructed for the expression of B.t.k. genes 
utilizing SSU promoters and SSU CTPs. The promoters 
and CTPs utilized were from the petunia SSUlla gene 
described by Turner et al. (1986) and from the 

Arabldopsis atslA gene (an SSU gene) described by 
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Krebbers et al. (1988) and by Elionor et al. (1989). 

The petunia SSUlla promoter was contained on a 'DNA 

fragment that extended approximately 800 bp upstream 
5 

of the SSU coding sequence. The Arabldopsis atslA 
promoter was contained on a DNA fragment that extended 
approximately 1.8 kb upstream of the SSU coding 
sequence. At the upstream end convenient sites from 
the multilinker of pUC18 were used to move these 
^ promoters into plant transformation vectors such as 
pMON893. These promoter fragments extended to the 
start of the SSU coding sequence at which point an 
Ncol restriction site was engineered to allow 
insertion of the B.t. coding sequence, replacing the 
15 SSU coding sequence. 

When SSU promoters were used in combination with 

their CTP, the DNA fragments extended through the 

coding sequence of the CTP and a small portion of the 

mature SSU coding sequence at which point an Ncol 

2 q restriction site was engineered, by standard techniques 

to allow the in frame fusion of B.t. coding sequences 

with the CTP. In particular, for the petunia SSUlla 

CTP, B.t. coding sequences were fused to the SSU 

sequence after amino acid 8 of the mature SSU sequence 

at which point the Ncol site was placed. The 8 amino 
25 

acids of mature SSU sequence were included because 
preliminary in vitro chloroplast uptake experiments 
indicated that uptake was of B.t.Jr. was observed only 
if this segment of mature SSU was included. For the 
Arabidopsis atslA CTP, the complete CTP was included 
plus 24 amino acids of mature SSU sequence plus the 
sequence gly-gly-arg-val-asn-cys-met-gln-ala-met, 
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terminating in an Ncol site for E.t. fusion. This 

short sequence reiterates the native SSU CTP cleavage 

site (between the cys and met) plus a short segment 
5 

surrounding the cleavage site. This sequence was 
included in order to insure proper uptake into 
chloroplasts. B.t. coding sequences were fused to 
this atslA CTP after the met codon. In vitro uptake 
experiments with this CTP construction and other (non- 
B.t.) coding sequences showed that this CTP did target 
proteins to the chloroplast. 

When CTPs were used in combination with the CaMV 
35S promoter, the same CTP segments were used. They 
were excised just upstream of the ATG start sites of 
15 the CTP by engineering of Bglll sites, and placed 
downstream of the CaMV35S promoter in pMON893, as 
Bglll to Ncol fragments. B.t. coding sequences were 
fused as described above. 

The wild type B.t.k. HD-1 coding sequence of 

2 q pMON9921 (see Figure 1) was fused to the atslA 

promoter to make pMON1925 or the atslA promoter plus 

CTP to make pMON1921. These vectors were used to 

transform tobacco plants, and the plants were screened 

for activity against tobacco hornworm. No toxic 

plants were recovered. This is surprising in light of 
25 

the fact that toxic plants could be recovered, albeit 
at a low frequency, after transformation with pMON9921 
in which the B.t.k. coding sequence was expressed from 
the enhanced CaMV35S, promoter in pMON893, and in 
light of the fact that Elionor et al. (1989) report 
that the atslA promoter itself is comparable in 
strength to the CaMV35S promoter and approximately 10- 
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fold stronger when the CTP sequence is included. At 
least for the wild-type B.t.k. HD-1 coding sequence, 
this does not appear to be the case. 

s 

A variety of plant transformation vectors were 
constructed utilizing either the truncated synthetic . 
HD-73 coding sequence of Figure 4 or the full length 
B.t.k. HD-73 coding sequence of Figure 11. These ate 
listed in the table below. 

10 


15 


20 


25 
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Table XV 


pMON10806 

En 35S 

atslA 

truncated 

PMON10814 

En35S 

SSUlla 

full length 

PMON10811 

SSUlla 

SSUlla 

truncated 

pMON10819 

SSUlla 

none 

truncated 

pMON10815 

atslA 

none 

truncated 

PMON10817 

atslA 

atslA 

truncated 

PMON10821 

En 35S 

atslA 

truncated 

PMON10822 

En 35S 

atslA 

full length 

pMON10838 

SSUlla 

SSUlla 

full length 

pMON10839 

atslA 

atslA 

full length 

All of the 

above 

vectors 

were used to 

tobacco plants 

For 

all of 

the vectors 


truncated B.t.k. genes, leaf tissue from these plants 
has been analyzed for toxicity to insects and B.t.k. 
protein levels by immunoassay. pMON10806, 10811, 
10819 and 10821 produce levels of B.t.k. protein 
comparable to pMON5383 and pMON5390 which contain 
synthetic B.t.k. HD-73 coding sequences driven by the 
En 35S promoter itself with no CTP. These plants also 
have the insecticidal activity expected for the B.t.k. 
protein levels detected. For pMON10815 and pMON10817 
(containing the atslA promoter), the level of B.t.k. 
protein is about 5-fo'ld higher than that found in 
plants containing pMON5383 or 5390. These plants also 
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have higher insecticidal activity. Plants containing 

10815 and 10817 contain up to 1% of their total 

soluble leaf protein as B.t.k. HD-73. This is the 
5 

highest level of B.t.k. protein yet obtained with any 
of the synthetic genes. 

This result is surprising in two respects. First, 
as noted above, the wild type coding sequences fused 
to the atslA promoter and CTP did not show any 
^ evidence of higher levels of expression than for En 
35S, and in fact had lower expression based on -the 
absence of any insecticidal plants. Second, Elionor 
et al. (1989) show that for two other genes, the atslA 
CTP can increase expression from the atslA promoter by 
15 about 10-fold. For the synthetic B.t.k. HD-73 gene, 
there is no consistent increase seen by including the 
CTP over and above that seen for the atslA promoter 
alone. 

Tobacco plants containing the full length synthetic 

20 HD-73 fused to the SSU11A CTP and driven by the En 35S 

promoter produced levels of B.t.k. protein and 

insecticidal activity comparable to pMON1518 which 

contains does not include the CTP. In addition, for 

pMON10518 the B.t.k. protein extracted from plants was 

observed by gel electrophoresis to contain multiple 
25 

forms less than full length, apparently due the 
cleavage of the C-terminal portion (not required for 
toxicity) in the cytoplasm. For pMON10814, the 
majority of the protein appeared to be intact full 
length indicating that the protein has been stabilized 
from proteolysis by targeting to the chloroplast. 
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Examp-lfi_ll-— Targeting of B.t. Protei ns to the 

Extracellular Space or Vac uole through the Use of 
5 Signal Peptides 

The S.t. proteins produced from the synthetic genes 
described here are localized to the cytoplasm of the 
plant cell, and this cytoplasmic localization results 
in plants that are insecticidally effective. It may 
be advantageous for some purposes to direct the B.t. 
proteins to other compartments of the plant cell. 
Localizing B.t. proteins in compartments other than 
the cytoplasm may result in less exposure of the B.t. 
proteins to cytoplasmic proteases leading to greater 
15 accumulation of the protein yielding enhanced 
insecticidal activity. Extracellular localization 
could lead to more efficient exposure of certain 
insects to the B.t. proteins leading to greater 
efficacy. If a B.t. protein were found to be 
2 q deleterious to plant cell function, then localization 
to a noncytoplasmic compartment could protect these 
cells from the . protein. 

In plants as well as other eucaryotes, proteins 
that are destined to be localized either 
extracellularly or in several specific compartments 
are typically synthesized with an N-terminal amino 
acid extension known as the signal peptide. This 
signal peptide directs the protein to enter the 
compartmentalization pathway, and it is typically 
cleaved from the mature protein as an early step in 
compartmentalization. For an extracellular protein, 
the secretory pathway typically involves 
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cotranslational insertion into the endoplasmic 

reticulum with cleavage of the signal peptide occuring 

g at this stage. The mature protein then passes thru 

the Golgi body into vesicles that fuse with the plasma 

membrane thus releasing the protein into the 

extracellular space. Proteins destined for other 

compartments follow a similar pathway. For example, 

proteins that are destined for the endoplasmic 

reticulum or the Golgi body follow this scheme, but 

they are specifically retained in the appropriate 

compartment. In plants, some proteins are also 

targeted to the vacuole, another membrane bound 

compartment in the cytoplasam of many plant cells. 

35 Vacuole targeted proteins diverge from the above 

pathway at the Golgi body where they enter vesicles 

that fuse with the vacuole. • 

A common feature of this protein targeting is the 

signal peptide that initiates the compartmentalization 

20 process. Fusing a signal peptide to a protein will in 

many cases lead to the targeting of that protein to 

the endoplasmic reticulum. The efficiency of this 

step may depend on the sequence of the mature protein 

itself as well. The signals that direct a protein to 

a specific compartment rather than to the 
25 

extracellular space are not as clearly defined. It 
appears that many of the signals that direct the 
protein to specific compartments are contained within 
the amino acid sequence of the mature protein. This 
has been shown for some vacuole targeted proteins, but 
it is not yet possible to define these sequences 
precisely. It appears that secretion into the 


30 




WO 90/10076 


PCT/US90/00778 


-108- 


extracellular space is the "default” pathway for a 

protein that contains a signal sequence but no other 

compartmentalization signals. Thus, a strategy to 

5 

direct B.t. proteins out of the cytoplasm is to fuse 
the genes for synthetic B.t. genes to DNA sequences 
encoding known plant signal peptides. These fusion 
genes will give rise to B.t. proteins that enter the 
secretory pathway, and lead to extracellualar 
secretion or targeting to the vacuole or other 
compartments. 

Signal sequences for several plant genes have been 
described. One such sequence is for the tobacco 
pathogenesis related protein PRlb described by 
16 Cornelissen et al. The PRlb protein is normally 

localized to the extracellular space. Another type of 
signal peptide is contained on seed storage proteins 
of legumes. These proteins are localized to the 
protein body of seeds, which is a vacuole like 
2 Q compartment found in seeds. A signal peptide DNA 
sequence for the beta subunit of the IS storage 
protein of common bean (Phaseolus vulgaris), PvuB has 
been described by Doyle et al. Based on the published 
these published sequences, genes were synthesized by 
chemical synthesis of oligonucleotides that encoded 
the signal peptides for PRlb and PvuB. The synthetic 
genes for these signal peptides corresponded exactly 
to the reported DNA sequences. Just upstream of the 
translational intiation codon of each signal peptide a 
BamHI and Bglll site were inserted with the BamHI site 
at the 5' end. This allowed the insertion of the 
signal peptide encoding segments into the Bglll site 
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of pMON893 for expression from the En 35S promoter. 

In some cases to . achieve secretion or 

compartmentali 2 ation of heterologous proteins, it has 
5 

proved necessary to include some amino acid sequence 

beyond the normal cleavage site of the signal peptide. 

This may be necessary to insure proper cleavage of the 

signal peptide. For PRlb the synthetic DNA sequence 

also included the first 10 amino acids of mature PRlb. 

^ For PvuB the synthetic DNA sequence included the first 

13 amino acids of mature PvuB. Both synthetic signal 

peptide encoding segments ended with Ncol sites to 

allow fusion in frame to the methionine initiation 

codon of the synthetic B.t. genes. 

15 Four vectors encoding synthetic B.t.k. HD-73 genes 

were constructed containing these signal peptides. 

The synthetic truncated HD-73 gene from pMON5383 was 

fused with the signal peptide sequence of PvuB and 

incorporated into pMON893 to create pMON10827. The 

2 q synthetic truncated HD-73 gene from pMON5383 was also 

fused with the signal peptide sequence of PRlb to 

create pMON10824. The full length synthetic HD-73 

gene from pMON10518 was fused with the signal peptide 

sequence of PvuB and incorporated into pMON8 93 to 

create pMON10828. The full length synthetic HD-73 
25 

gene from pMON10518 was also fused with the signal 
peptide sequence of PRlb and incorporated into pMON893 
to create pMON10825. 

These vectors were used to transform tobacco plants 
and the plants were assayed for expression of the 
B.t.k. protein by Western blot analysis and for 
insecticidal efficacy. pMON10824 and pMON10827 
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produced amounts of B.t.k. protein in leaf comparable 

to the truncated HD-73 vectors, pMON5383 and pMON5390. 

_ pMON10825 and pMON10828 produced full length B.t.k. 

5 

protein in amounts comparable to pMON10518. In all 
cases, the plants were insecticidally active against 
tobacco hornworm. 
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Claims : 


1. In a method for improving the expression of a 
heterologous gene in plants by modifying the 
structural coding sequence of said gene, the 
improvement which comprises reducing the occurrence of 
polyadenylation signals selected from the group 
consisting of AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 

10 ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA, ATTAAA, AATTAA, AATACA and CATAAA. 

2. The method of Claim 1 further comprising the 
improvement of reducing the occurrence of ATTTA 
sequences within the structural coding sequence. 

15 3. A method for modifying a wild-type structural 

gene sequence which encodes an insecticidal protein of 
Bacillus thuringiensis to enhance the expression of 
said protein in plants which comprises: 

removing polyadenylation signals contained in 
said wild-type gene while retaining a sequence 
which encodes said protein; and 
removing ATTTA sequences contained in said 
wild-type gene while retaining a sequence 
which encodes said protein. 

A method of Claim 3 further comprising the 
removal of self-complementary sequences and re¬ 
placement of such sequences with nonself-complementary 
DNA comprising plant preferred codons while retaining 
a structural gene sequence encoding said protein. 

5. A method of Claim 4 further comprising the use 
of plant preferred sequences in the removal of the 
polyadenylation signals and ATTTA sequences. 


20 


a) 


b) 


OR 
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6 . A method of Claim 3 in which the poly- 
adenylation signals are selected from the group 

^ consisting of AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA, ATTAAA, AATTAA, AATACA and CATAAA. 

7. A method of Claim 4 in which the 
polyadenylation signals are selected from the group 
consisting of AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 

10 ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA, ATTAAA, AATTAA, AATACA and CATAAA. 

8 . A method of Claim 5 in which the 
polyadenylation signals are selected from the group 
consisting of AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 

15 ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA, ATTAAA, AATTAA, AATACA and CATAAA. 

9. A method for modifying a wild-type structural 
gene sequence which encodes an insecticidal protein of 
Bacillus thuringiensis to enhance the expression of 

2 q said protein in plants which comprises: 

a) identifying regions within said sequence with 
greater than four consecutive adenine or 
thymine nucleotides; 

b) modifying the regions of step (a) which have 

two or more polyadenylation signals within a 
25 

ten base sequence to remove said signals while 
maintaining a gene sequence which encodes said 
protein; and 

c) modifying the 15-30 base regions surrounding 
the regions of step (a) to remove major plant 

30 polyadenylation signals, consecutive sequences 

containing more than one minor polyadenylation 
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signal and consecutive sequences containing 
more than one ATTTA sequence while maintaining 

a gene sequence which encodes said protein. 

5 

10. A method of Claim 9 in which the major plant 
polyadenylation signals are selected from the group 
consisting of AATAAA and AATAAT. 

11. A method of Claim 10 in which the 
polyadenylation signals are selected from the group 

10 consisting of AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA, ATTAAA, AATTAA, AATACA and CATAAA. 

12. A method of Claim 11 further comprising the 
use of plant preferred sequences in the removal of 

15 polyadenylation signals and ATTTA sequences. 

13. A structural gene which encodes an 
insecticidal protein of Bacillus thuringiensis, said 
gene being substantially devoid of polyadenylation 
signals and ATTTA sequences. 

2 q 14. A structural gene' of Claim 13 which, is 

substantially devoid of polyadenylation signals 

selected from the group consisting of AATAAA, AATAAT, 

AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, 

AAGCAT, ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA 

and CATAAA. 

25 


30 
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15. A structural gene of Claim 13 which encodes an 
insecticidal protein of B.t. k. HD-1 having the 
sequence: 



1 

ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 

40 


41 

* • • • 

TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 

80 

10 

81 

■ * • • 

TGCTGGATTTGTGTTAGGACTAGTTGATATTATCTGGGGA 

120 


121 

• • * 4 

ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 

160 

35 

161 

* * * * 

TTGAACAGCTCATCAACCAGAGAATCGAAGAGTTCGCTAG 

* * • • 

200 


201 

GAATCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 

240 


241 

• * * • 

TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 

280 

20 

281 

• • • • 

ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 

320 


321 

* • • • 

ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 

360 


361 

4 * • • 

CTTTTTGCAGT T CAAAATTATCAAGT T CC T C T CCTCTCCG 

400 

25 

401 

• « • . 

TGTACGTTCAAGCTGCCAACCTCCACCTCTCAGTTTTGAG 

440 


441 

• * • . 

AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 

480 


481 

GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 

520 


30 
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• a a a 

521 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 560 

5 .... 

561 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

a a a • 

601 ATCAGGTACAACCAGTTCAGAAGAGAGCTTACACTAACTG 640 

• a a , a 

641 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 680 

10 

• • • • 

681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 

a a • a 

721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 7 60 

• • * a 

15 761 TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAG 800 

• • » a 

801 TCCACATTTGATGGATATACTTAATAGTATAACCATCTAT 840 

a a a a 

841 ACGGATGCTCATAGAGGAGAATACTACTGGTCCGGTCACC 880 

20 .... 

881 AGATCATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 

a * a a 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 960 

• • a « 

961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

25 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAACAT 1040 

a a a • 

1041 CGGGATCAACAACCAACAACTATCTGTTCTTGACGGGACA 1080 


30 1 081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
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1121 

* • • • 

TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 

1160 

5 

1161 

• • * • 

ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 

1200 


1201 

• • • • 

AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 

1240 

10 

1241 

• • • • 

TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 

• • « • 

1280 


1281 

CTCTTGGATACATCGTAGTGCTGAGTTCAACAACATCATC 

1320 


1321 

• * « . 

CCTTCATCACAAATCACCCAAATCCCACTCACCAAGTCTA 

1360 

15 

1361 

• • * • 

CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 

1400 


1401 

• • * • 

ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 

1440 


1441 

• * • • 

CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 

1480 

20 

1481 

• • « • 

CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 

1520 


1521 

• • • • 

AAACCTTCAGTTCCACACATCAATTGACGGAAGACCTATT 

1560 

25 

1561 

• • • • 

AATCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTA 

1600 


1601 

• • • • 

ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 

1640 


1641 

• • • • 

TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 

1680 


30 
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• • • • 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 

• # 

5 

1721 ATCGAATTGAATTTGTTCCGGCA 1743. 

16. A structural gene of Claim 13 which encodes an 
insecticidal protein of B.t.k. HD-73 having the 
sequence: 

10 . ... 

1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 40 

• • • • 

41 TGTCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGG 80 

• • • • 

15 81 TGCTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGT 120 

• • • ■ 

121 ATCTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAA 160 

• * # * 

161 TTGAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAG 200 

20 .... 

201 GAACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTC 240 

241 TACCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG 280 

281 ATCCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCA 320 

25 

321 ATTCAACGACATGAACAGCGCCTTGACCACAGCTATCCCA 360 

> « * ■ 

361 TTGTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCG 400 

30 401 TGTACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCG 


440 
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441 

• • • • 

AGACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCT 

480 

5 

481 

• • • • 

GCAACCATCAATAGCCGTTACAACGACCTTACTAGGCTGA 

520 


521 

• • • * 

TTGGAAACTACACCGACCACGCTGTTCGTTGGTACAACAC 

560 

10 

561 

• • • • 

TGGCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGG 

• • • . 

600 


601 

ATTAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAG 

640 


641 

• • • • 

TTTTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAG 

680 

15 

681 

• • • . 

AACCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAA 

720 


721 

• • • * 

ATCTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCT 

760 


7 61 

• • • . 

TCCGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAG 

800 

20 

801 

' * • . 

CCCACACTTGATGGACATCTTGAACAGCATAACTATCTAC 

840 


841 

• 

ACCGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACC 

880 

25 

881 

• • * • 

AGATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTT 

920 


921 

• 

TACCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCA 

960 


961 

• ■ • 

CAACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA 

1000 

30 

1001 

GAACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATAT 

1040 
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1041 

• • • * 

CGGTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACA 

1080 

5 

1081 

• » • * 

GAGTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTG 

1120 


1121 

* • • « 

TTTACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAAT 

1160 

10 

1161 

• ♦ • • 

CCCACCACAGAACAACAATGTGCCACCCAGGCAAGGATTC 

• • • t 

1200 


1201 

TCCCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGAT 

1240 


1241 

• • * ■ 

TCAGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTT 

1280 

15 

1281 

• • • • 

CTCTTGGATACACCGTAGTGCTGAGTTCAACAACATCATC 

1320 


1321 

• • • ■ 

GCATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 

1360 


1361 

• • • • 

ACTTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATT 

1400 

20 

1401 

* * * # 

CACTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAAT 

1440 


1441 

* • 4 < 

AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 

1480 

25 

1481 

TCCCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTA 

• • • • 

1520 


1521 

TGCTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT 

1560 


1561 

• • • 4 

AATTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTA 

1600 

30 

1601 

1 • t • 

CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTT 

1640 
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1641 

• • • • 

TGAAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATC 

* 

1680 

5 

1681 

• • • • 

GTGGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTA 

1720 


1721 

TCGACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGA 

1760 

ID 

17 61 

GGCTGAG 1767. 



17. A structural gene of Claim 13 encoding a 
insecticidal protein of B.t.k. HD-1 having the 
sequence: 

■ 15 

1 

• • * • 

ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 

40 


41 

• * • • 

ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 

80 


81 

• • • * 

ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 

120 

20 

121 

* * • • 

TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

160 


161 

• * • • 

CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 

200 

25 

201 

• • • • 

CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 

240 


241 

• * • • 

GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

280 


281 

• • • • 

ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

320 5 

30 

321 

• • • . 

CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

360 
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‘ 361 

• • • * 

CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

400 

5 

401 

* • • • 

TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

440 


441 

f * • * 

GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

480 

10 

481 

• • • • 

TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

• * • * 

520 


521 

ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

560 


561 

• • i * 

AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

600 

15 

601 

GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

640 


641 

• • • • 

gcttggagcgtgtctggggtcctgattctagagattggat 

680 


681 

TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

720 

20 

721 

* * * * 

TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

760 . 


761 

* * * * 

CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

800 

25 

801 

• • • • 

CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

t 1 « « 

840 


841 

CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

880 


881 

• • • • 

CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

920 

30 

921 

* • • • 

CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

960 
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' 961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

5 .... 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • a 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

ID 

• • • • 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• * • • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • * • 

15 1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

20 .... 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• • • t 

1361 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

• • • • 

1401 TTCCTCTCAAATCACCCAAATCCCATTGACCAAGTCTACT 1440 

25 

• * • . 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

• * • • 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 

• • • . 

30 1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 


1560 
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1561 

* • • • 

CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 

1600 

1601 

• • • • 

ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 

1640 

1641 

TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 

1680 

1681 

• • • • 

TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 

1720 

1721 

• • • • 

CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 

1760 

1761 

• • • • 

CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 

1800 

1801 

• • • • 

CGTATTGAGTTTGTGCCTGCCGAAGTTACCTTCGAGGCTG 

1840 

1841 

AGTAC 1845. 



18. A structural gene of Claim 13 encoding an 
2 q insecticidal protein derived from B.t.k. HD-73 having 
the sequence: 

* * • • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• • • • 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

25 

• * ■ • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

♦ • • • 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • • 

30 161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 



WO 90/10076 

PCT/US90/00778 



-128- 



' 201 

CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 

240 

5 

241 

• * • • 

GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

280 


281 

* * * * 

ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

320 

10 

321 

• * • • 

CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

• • • • 

360 


361 

CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

400 


401 

• • • • 

TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

440 

15 

441 

• • • • 

GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

480 


481 

TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

520 


521 

ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

560 

20 

561 

• * • | 

AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

600 


601 

• * * . 

GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

640 

25 

641 

• • • • 

GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

680 


681 

* • • . 

TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

720 


721 

• • * . 

TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

760 

30 

761 

• • • • 

CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

800 
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' 801 

• * • . 

CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

840 

5 

841 

• « • . 

CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

880 


881 

• • • . 

CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

920 

10 

921 

• * * • 

CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

* • • . 

960 


961 

ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1000 


1001 

■ • • • 

CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1040 

15 

1041 

• • • • 

ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 

1080 


1081 

• • • • 

ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

1120 


1121 

GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 

1160 

20 

1161 

• • • • 

GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

1200 


1201 

• • • • 

TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

1240 

25 

1241 

• * • * 

CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 

1280 


1281 

• • • • 

CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 

1320 


1321 

• • • • 

AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 

1360 

30 

1361 

• • • • - 

CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 

1400 
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1401 

* 1 • • 

ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

4 

1440 

r- 

5 

1441 

• • • • 

TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 

1480 


1481 

• • • ft 

CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 

1520 

10 

1521 

• • • • 

CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

• • • ■ 

1560 


1561 

CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 

1600 


1601 

* * # • 

CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 

1640 

15 

1641 

* • ♦ • 

TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 

1680 


1681 

• • • • 

TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 

1720 


1721 

* * • • 

AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 

1760 

20 

1761 

• • • • 

GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 

1800 


1801 

♦ * • . 

GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 

1840 

25 

1841 

• • • ♦ 

CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 

1880 


1881 

• • * . 

CTGTTTACGTCTACAAACCAGCTTGGACTCAAGACAAATG 

1920. 


19 

A structural gene of Claim 13 encoding the 


full- 

length insecticidal protein of B.t.k. 

HD-73 

30 

having the sequence: 

* 
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• • • • 

ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 

40 

5 

41 

• • • * 

ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 

80 


81 

• • • i 

ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 

120 


121 

• * • i 

TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

160 

10 

161 

• * • • 

CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 

200 


201 

• < 1 1 

CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 

240 

15 

241 

* * * * 

GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

280 


281 

• « * • 

ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

320 


321 

• i • « 

CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

360 

20 

361 

• • • • 

CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

400 


401 

• < 1 ■ 

TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

440 


441 

• • f « 

GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

480 

25 

481 

• • • • 

TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

520 


521 

• • • i 

ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

560 

30 

561 

• • • • • 

AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

600 
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' 601 

• • • . 

GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

640 

5 

641 

• • • f 

GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

680 


681 

• • • • 

TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

720 

10 

721 

• • • • 

TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

• • • • 

760 


761 

CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

800 


801 

* • • * 

CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

840 

35 

841 

* • • • 

CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

880 


881 

• ■ • • 

CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

920 


921 

* * • . 

CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

960 

20 

961 

• ■ • * 

ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1000 


1001 

* • • • 

CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1040 

25 

1041 

• • • • 

ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 

1080 


1081 

* • • • 

ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

1120 


1121 

• • • . 

GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 

1160 

30 

1161 

* • • # 

GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

1200 
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1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

5 .... 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • * • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• * • • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

10 

• • * i 

1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 

• • • • 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 144 0 

• • * • 

15 1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• • • • 

1481 CTGGTGGAGACCTCGTTAGACTC AACAGCAGTGGAAATAA 1520 

• • ■ • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

20 .... 

1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

• • • • 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

• • • • 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

25 

• • * « 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

■ • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

* • # * 

30 1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 
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' 1801 

• • • * 

GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 

1840 

5 

1841 

• • • t 

CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 

1880 


1881 

* • t # 

GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCAAT 

1920 

10 

1921 

• ♦ » • 

GTGACGGATTATCATATTGATCAAGTGTCCAACTTGGTGA 

• • • ■ 

1960 


1961 

CCTACCTCAGCGATGAGTTCTGTCTGGATGAAAAGCGAGA 

2000 


2001 

• • • . 

ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 

2040 

15 

2041 

• • • • 

GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 

2080 


2081 

• • * . 

ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 

2120 


2121 

■ * • . 

TACCATCCAGGGAGGTGACGACGTGTTCAAGGAGAACTAC 

2160 

20 

2161 

• • • • 

GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 

2200 


2201 

• • • • 

ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCCTT 

2240 

25 

2241 

• • • • 

TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 

2280 


2281 

• • * • 

GACCTCGAGATCTACCTCATCCGCTACAATGCAAAACATG 

2320 


2321 

■ • • • 

AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 

2360 

30 

2361 

• • • • 

TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 

2400 
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2401 

- • • • 

CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 

2440 

5 

2441 

• • • . 

GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 

2480 


2481 

• • • . 

TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 

2520 

10 

2521 

• • • • 

AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 

* • • • 

2560 


2561 

CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 

2600 


2601 

• • » i 

CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 

2640 

15 

2641 

• • • • 

AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAGAAGT 

2680 


2681 

* • • • 

TGGAATGGGAGACCAACATCGTCTACAAAGAGGCAAAAGA 

2720 


2721 

• • » , 

ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 

2760 

20 

2761 

* • • . 

TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 

2800 


2801 

• • • , 

ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 

2840 

25 

2841 

* * • . 

GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 

2880 


2881 

• • 

GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTCTACG 

2920 


2921 

• • . 

ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 

2960 

30 

2961 

* * • * 

CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 

3000 
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- - 


3001 

• • • . 

GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 

3040 

5 

3041 

• • • * 

GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 

3080 


3081 

• • • * 

TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 

3120 

10 

3121 

• • • • 

TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 

* * • ■ 

3160 


3161 

ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 

3200 


3201 

AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 

3240 

15 

3241 

• • • I 

GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 

3280 


3281 

* • • • 

ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 

3320 


3321 

• • • • 

TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 

3360 

20 

3361 

• • ■ ■ 

AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 

3400 


3401 

* * • * 

ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 

3440 

25 

3441 

• * • * 

ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 

3480 


3481 

• • • • 

GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 

3520 


3521 

TCCTTATGGAGGAA 3534. 



30 
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20. 

A structural gene of Claim 13 encoding a 

full- 


length 

insecticidal protein of B.t.k. HD-73 having the 

5 

sequence: 



i 

• * • • 

ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 

40 


41 

• ■ i • 

ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 

80 

10 

81 

ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 

120 


121 

• t • * 

TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

160 


161 

CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 

200 

15 

201 

■ * • * 

CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 

240 


241 

• « i • 

GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

280 

20 

281 

* ■ * * 

ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

320 


321 

* * * * 

CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

360 


361 

• « * • 

CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

400 

25 

401 

• • • • 

TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

440 


441 

GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

480 


481 

• • • « 

TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

520 


30 
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521 

• • ■ • 

ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

560 

5 

561 

• • • « 

AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

600 


601 

• • • * 

GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

640 

10 

641 

• • • • 

GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

• • • * 

680 


681 

TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

720 


721 

• • ■ • 

TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

760 

15 

761 

* ■ * • 

CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

800 


801 

* • • • 

CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

840 


841 

• • * • 

CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

880 

20 

881 

• ■ ■ * 

CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

920 


921 

• • • * 

CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

960 

25 

961 

• • • m 

ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1000 


1001 

• • • • 

CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1040 


1041 

• • • • 

ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 

1080 

30 

1081 

« 1 t « 

ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

1120 
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' 1121 

GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 

1160 

5 

1161 

• • • * 

GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

1200 


1201 

• • • • 

TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

1240 

10 

1241 

• * • • 

CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 

• • * • 

1280 


1281 

CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 

1320 


1321 

• * • • 

AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 

1360 

15 

1361 

• * • • 

CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 

1400 


1401 

• • • • 

ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

1440 


1441 

• • • • 

TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 

1480 

20 

1481 

• • • < 

CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 

1520 


1521 

• • • • 

CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

1560 

25 

1561 

• • • ■ 

CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 

• • * • 

1600 


1601 

CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 

1640 


1641 

» • • * 

TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 

1680 

30 

1681 

• • • • 

TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 

1720 
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1721 

•" • • * 

AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 

1760 

5 

1761 

• * • • 

GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 

1800 


1801 

• • • • 

GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 

1840 

10 

1841 

♦ • • • 

CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 

• • • • 

1880 


1881 

GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 

1920 


1921 

* • • • 

GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 

1960 

15 

1961 

• • • • 

CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 

2000 


2001 

• « • • 

ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 

2040 


2041 

• • • • 

GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 

2080 

20 

2081 

• • * * 

ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 

2120 


2121 

TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 

2160 

25 

2161 

• ♦ « * 

GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 

• • ♦ • 

2200 


2201 

ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 

2240 


2241 

• * • * 

TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 

2280 

30 

2281 

* • * • 

GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 

2320 
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» • • ♦ 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

5 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

• • • • • 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

• • • « 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

10 

« • • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

15 2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

• • « • 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 

20 .... 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

• * ■ • 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 

* * • • 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

25 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

• • • • 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 


30 2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 
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'2921 

• • • « 

ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 

2960 

5 

2961 

• • • • 

CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 

3000 


3001 

• • * * 

GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 

3040 

10 

3041 

* * * • 

GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 

• • • • 

3080 


3081 

TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 

3120 


3121 

• • • • 

TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 

3160 

15 

3161 

• • • ■ 

ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 

3200 


3201 

* • • • 

AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 

3240 


3241 

• • • a 

GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 

3280 

20 

3281 

• • • « 

ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 

3320 


3321 

• • • * 

TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 

3360 

25 

3361 

• * . . 

AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 

3400 


3401 

* • • • 

ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 

3440 


3441 

• ■ • • 

ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 

3480 


30 







' " •• ** t“ 
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F 

3481 

GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 

3520 

5 

3521 

TCCTTATGGAGGAA 3534. 


10 

21. A structural gene of Claim 13 encoding a full- 
length insecticidal protein of B.t.k. HD-73 having the 

sequence: 

* * * * 


1 

ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 

40 


41 

ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 

80 

15 

81 

ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 

120 


121 

• # • * 

TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

160 


161 

• ■ • • 

CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 

200 

20 

201 

• • I • 

CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 

240 


241 

GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

280 

25 

281 

» • • • 

ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

* ■ • * 

320 


321 

CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

360 

- 

361 

CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

400 

30 

401 

• « • t 

TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

440 
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5 


10 


15 


20 


25 


30 


441 

GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

480 

481 

• • 1 • 

TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

520 

521 

ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

560 

561 

AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

600 

601 

GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

640 

641 

GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

680 

681 

TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

720 

721 

• • • • 

TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

760 

761 

• • « » 

CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

800 

801 

• * » * 

CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

840 

841 

• • • ■ 

CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

880 

881 

• • • • 

CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

920 

921 

• * • ■ 

CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

960 

961 

• • • • 

ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1000 

1001 

• • * • 

CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1040 
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1041 

• * • • 

ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 

1080 

5 

1081 

• ■ • • 

ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

1120 


1121 

• • • • 

GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 

1160 

10 

1161 

• • • 9 

GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

• • • • 

1200 


1201 

TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

1240 


1241 

• • • • 

CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 

1280 

15 

1281 

• « • . 

CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 

1320 


1321 

• • • • 

AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 

1360 t 


1361 

* * ■ * 

CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 

1400 

20 

1401 

• • • • 

ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

1440 


1441 

• • • * 

TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 

. 1480 

25 

1481 

• ■ • • 

CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 

1520 


1521 

■ • • • 

CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

1560 


1561 

• • • * 

CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 

1600 

30 

1601 

• • * . 

CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 

1640 
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1641 

• • • • 

TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 

1680 

5 

1681 

• * • * 

TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 

1720 


1721 

• • * • 

AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 

1760 

10 

1761 

• « • * 

GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 

• « • • 

1800 


1801 

GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 

1840 


1841 

• * * • 

CTGAGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGC 

1880 

15 

1881 

• • * • 

CCTCTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAAC 

1920 


1921 

• * • • 

GTTACTGACTATCACATTGACCAAGTGTCCAACTTGGTCA 

1960 


1961 

• • • • 

CCTACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGA 

2000 

20 

2001 

• * • • 

ACTCTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGAC 

2040 


2041 

• • • • 

GAGAGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCA 

2080 

25 

2081 

• • • . 

ACAGGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGAT 

2120 


2121 

CACCATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTAC 

2160 


2161 

• • • • 

GTCACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCT 

2200 

30 

2201 

• • • • 

ACTTGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTT 

2240 
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• • • « 

'2241 CACCAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAA 2280 

5 .... 

2281 GACCTTGAAATCTACTCGATCAGGTACAATGCCAAGCACG 2320 

• • • • 

2321 AGACCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACT 2360 

• • • . 

2361 TTCTGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAAC 2400 

10 

• • • • 

2401 AGATGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACT 2440 

• • • • 

2441 GCTCCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCA 2480 

• • • * 

35 2 4 81 TCACTTCTCCTTGGACATCGATGTGGGATGTACTGACCTG 2520 

• • * • 

2521 AATGAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGA 2560 

• ■ * ■ 

2561 CCCAAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCT 2600 

20 .... 

2601 CGAAGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTG 2640 

• • * • 

2641 AAGAGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAAC 2680 

• • * * 

2681 TCGAATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGA 2720 

25 

• * « « 

2721 GTCCGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAG 2760 

* * • 

2761 TTGCAAGCCGACACCAACATCGCCATGATCCACGCCGCAG 2800 


30 2 8 01 ACAAACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGA 


2840 
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2841 

* • « • 

GTTGTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAG 

2880 

5 

2881 

• • • • 

GAACTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACG 

2920 


2921 

• • • • 

ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 

2960 

10 

2961 

• • • • 

CCTCAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAG 

• • * • 

3000 


3001 

GAACAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGT 

3040 


3041 

* ' • . 

GGGAAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGG 

3080 

15 

3081 

• • • • 

TAGAGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGA 

3120 


3121 

• * • • 

TACGGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACA 

3160 


3161 

* * * • 

ACACCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGA 

3200 

20 

3201 

• * • • 

AATCTATCCCAACAACACCGTTACTTGCAACGACTACACT 

3240 


3241 

• • • > 

GTGAATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTA 

3280 

25 

3281 

• • • , 

ACAGAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTA 

• • • • 

3320 


3321 

TGCCTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGA 

3360 


3361 

* * • * 

CGTGAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACT 

3400 

30 

3401 

• • • . 

ACACACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGA 

3440 
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3441 

• • * « 

GTACTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 

3480 

5 

3481 

GAAACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTC 

3520 

3521 

TCTTGATGGAGGAA 3534. 



22. A structural gene of Claim 13 which encodes an 
insecticidal protein of B.t.t. having the sequence: 


1 

• • • « 

ATGACTGCAGACAACAACACCGAAGCCCTCGACAGTTCTA 

40 

41 

* • • • 

CCACTAAGGATGTTATCCAGAAGGGTATCTCCGTTGTGGG 

80 

81 

AGACCTCTTGGGCGTGGTTGGATTTCCCTTCGGTGGAGCC 

120 

121 

• • • • 

CTCGTGAGCTTCTATACAAACTTTCTCAACACCATTTGGC 

160 

161 

• • * • 

CAAGCGAGGACCCTTGGAAAGCATTCATGGAGCAAGTTGA 

200 

201 

* * * * 

AGCTCTTATGGATCAGAAGATTGCAGATTATGCCAAGAAC 

240 

241 

• • • “ • 

AAGGCTTTGGCAGAACTCCAGGGCCTTCAGAACAATGTGG 

280 

281 

* • • • 

AGGACTACGTGAGTGCATTGTCCAGCTGGCAGAAGAACCC 

320 

321 

• • • ■ 

TGTTAGCTCCAGAAATCCTCACAGCCAAGGTAGGATCAGA 

360 

361 

• * * * 

GAGTTGTTCTCTCAAGCCGAATCCCACTTCAGAAATTCCA 

400 


30 



WO 90/10076 

PCT/US90/00778 



-150- 



401 

TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 

440 

5 

441 

• • • • 

CACTACCTATGCTCAAGCTGCCAACACCCACTTGTTTCTC 

480 


481 

• • * • 

CTTAAGGACGCTCAAATCTATGGAGAAGAGTGGGGATACG 

520 

10 

521 

• • • • 

AGAAAGAGGACATTGCTGAGTTCTACAAGCGTCAACTTAA 

• • • * 

560 


561 

GCTCACCCAAGAGTACACTGACCATTGCGTGAAATGGTAT 

600 


601 

• * • • 

AACGTTGGTCTCGATAAGCTCAGAGGCTCTTCCTACGAGT 

640 

15 

641 

* * • ■ 

CTTGGGTGAACTTCAACAGATACAGGAGAGAGATGACCTT 

680 


681 

• • t t 

GACTGTGCTCGAT CTTATC GCACTCTTTCCCTTGTAC GAT 

720 


721 

• • • • 

GTGAGACTCTACCCAAAGGAAGTGAAAACTGAGCTTACCA 

760 

20 

761 

• • • • 

GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT 

800 


801 

• * • • 

TAGGGGTTATGGAACTACCTTCAGCAATATCGAAAACTAC 

840 

25 

841 

• • • t 

ATTAGGAAACCACATCTCTTCGACTATCTTCACAGAATTC 

880 


881 

• * • • 

AATTCCACACAAGGTTTCAACCAGGATACTATGGTAACGA 

920 


921 

* • • « 

CTCCTTCAACTATTGGTCCGGTAACTATGTTTCCACCAGA 

960 

30 

961 

• • • • 

CCAAGCATTGGATCTAATGACATCATCACATCTCCCTTCT 

1000 
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1001 

• • • * 

ATGGTAACAAGTCCAGTGAACCTGTGCAGAACCTTGAGTT 

1040 

5 

1041 

• • • • 

CAACGGCGAGAAAGTCTATAGAGCCGTCGCAAACACCAAT 

1080 


1081 

• • • • 

CTCGCTGTGTGGCCATCCGCAGTTTACTCAGGCGTCACAA 

1120 

10 

1121 

• • • • 

AGGTGGAGTTTAGTCAGTATAACGATCAGACCGATGAGGC 

• • • • 

1160 


1161 

CAGCACCCAGACTTACGACTCCAAACGTAACGTTGGCGCA 

1200 


1201 

• • * • 

GTCTCTTGGGATTCTATCGACCAATTGCCTCCAGAAACCA 

1240 

15 

1241 

• ■ • • 

CAGACGAACCATTGGAGAAGGGCTACAGCCACCAACTTAA 

1280 


1281 

• • • f 

CTATGTGATGTGCTTCTTGATGCAAGGTTCCAGAGGGACC 

1320 


1321 

• * • • 

ATTCCAGTGTTGACCTGGACACACAAGTCCGTGGACTTCT 

1360 

20 

1361 

• * * * 

TCAACATGATCGATAGCAAGAAGATCACTCAACTTCCCTT 

1400 


1401 

GGTGAAAGCCTACAAGCTGCAATCTGGTGCTTCCGTTGTC 

1440 

25 

1441 

• ■ • • 

GCAGGTCCCAGATTCACTGGAGGTGACATCATCCAGTGCA 

1480 


1481 

• * • • 

CAGAGAACGGCAGCGCAGCTACTATCTACGTGACACCTGA 

1520 


1521 

• • • • 

TGTGTCTTACTCTCAGAAGTACAGGGCAGGTATTCATTAC 

1560 

30 

156i 

* • • • 

GCATCTACCAGCCAGATCACCTTCACACTCAGCTTGGATG 

1600 
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* 

i601 

• • • « 

GAGCACCCTTCAACCAGTATTACTTTGACAAGACCATCAA 

1640 

1641 

■ a • • 

CAAAGGTGACACTCTCACATACAATAGCTTCAACTTGGCA 

1680 

1681 

• • • • 

AGTTTCAGCACACCATTTGAACTCTCAGGCAACAATCTTC 

1720 

1721 

• • • • 

AGATCGGCGTCACCGGTCTCAGCGCCGGAGACAAAGTCTA 

1760 

1761 

CATCGACAAGATTGAGTTCATCCCAGTGAAC 17 91. 



23. A structural gene of Claim 13 which encodes an 
insecticidal protein of B.t. entomocidus having the 
15 sequence: 

• • • • 

1 ATGGAGGAGAACAACCAAAACCAATGCATTCCATACAACT 40 

• * • • 

41 GCTTGAGTAACCCAGAAGAGGTATTGCTTGATGGAGAACG 80 

20 .... 

81 CATTTCAACCGGTAACTCTTCCATCGACATCTCCTTGTCC 120 

• • • • 

121 TTGGTCCAGTTTCTGGTCAGCAACTTCGTGCCAGGTGGTG 160 

• • • ■ 

161 GGTTCCTTGTCGGACTAATTGACTTCGTCTGGGGTATCGT 200 

25 

* • ■ . 

201 TGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATTGAG 240 

• • * • 

241 CAGTTGATCAACGAGAGGATCGCTGAGTTCGCCAGGAACG 280 

• • • • 

30 281 CTGCCATCGCTAACTTGGAAGGATTGGGCAATAACTTCAA 320 
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• • • . 

321 CATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGGACCCT 360 

5 .... 

361 AAC AAC C.C AG AG ACCC GC ACTAGGG TGAT CG AC AG AT TC A 400 

• • • • 

401 GAATCTTGGACGGCCTCTTGGAGAGAGATATCCCATCCTT 440 

* • • • 

441 CAGAATCTCTGGCTTCGAAGTTCCTCTCTTGTCCGTGTAC 480 

10 

• • • • 

481 GCTCAAGCAGCTAATCTTCACCTCGCTATCCTTCGAGACA 520 

• • • . 

521 GTGTCATCTTTGGGGAAAGGTGGGGATTGACCACTATCAA 560 

• • * • 

15 561 CGTCAATGAGAATTACAACAGACTTATCAGGCACATTGAC 600 

4 • • • 

601 GAGTACGCCGACCACTGTGCTAACACCTACAACCGTGGCT 640 

• ■ * • 

641 TGAACAATCTCCCTAAGTCTACTTATCAAGATTGGATTAC 680 

20 .... 

681 CTACAACAGGTTGAGGAGAGACTTGACCCTCACAGTTTTG 720 

• ■ • • 

721 GACATTGCAGCTTTCTTCCCGAACTATGACAACAGGAGAT 760 

* * • . 

761 ACCCTATCCAACCAGTGGGTCAACTTACCAGAGAAGTCTA 800 

25 

• • • . 

801 TACTGACCCACTTATCAACTTCAACCCTCAGTTGCAAAGT 840 

• * * 

841 GTCGCCCAACTTCCCACATTCAACGTCATGGAGTCCAGCC 880 


30 


881 GTATCAGGAACCCACACTTGTTTGACATCTTGAACAACCT 920 
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a a a a 

'921 TACTATCTTCACCGATTGGTTCAGCGTTGGGCGTAACTTC 960 

i * • • 

5 

961 TATTGGGGTGGACACAGGGTCATCTCCTCTCTTATTGGAG 1000 

• * • • 

1001 GTGGGAACATTACCTCTCCTATCTATGGACGTGAGGCAAA 1040 

• • • * I 

1041 CCAGGAGCCACCACGTAGTTTCACCTTCAACGGTCCAGTC 1080 

10 

• • • • 

1081 TTCAGAACCTTGTCTAACCCTACCTTGAGATTGCTCCAGC 1120 

• • a a 

1121 AACCTTGGCCAGCTCCACCTTTCAACCTTAGAGGTGTTGA 1160 

• • a , 

15 1161 GGGCGTTGAGTTCTCTACTCCTACCAACTCCTTCACTTAC 1200 

a a a a 

1201 AGAGGTAGAGGAACCGTTGATTCCTTGACCGAACTCCCAC 1240 

a a a , 

1241 CAGAGGACAATAGCGTGCCACCCAGGGAAGGCTACTCCCA 1280 

20 .... 

1281 CAGGTTGTGCCACGCAACCTTCGTGCAGCGTTCCGGAACT 1320 

• * « « 

1321 CCATTCCTCACTACAGGAGTTGTGTTCTCATGGACTGATC 1360 

a a a , 

1361 GTAGTGCTACTCTCACTAATACCATTGATCCCGAGAGGAT 1400 

25 

a a a , 

1401 CAATCAAATCCCATTGGTCAAGGGTTTCCGTGTGTGGGGA 1440 

a a a * 

1441 GGAACTTCTGTCATCACAGGACCAGGCTTCACAGGAGGTG 1480 

• • a . 

30 1481 ATATTCTTAGAAGAAACACTTTTGGCGACTTTGTGAGCCT 


1520 



WO 90/10076 

PCT/US90/00778 



-155- 



1521 

• • » • 

CCAAGTTAACATCAACTCTCCAATTACTCAAAGATATCGT 

1560. 

5 

1561 

• « • • 

CTCAGGTTTCGTTACGCATCTTCCCGTGACGCTAGAGTCA 

1600 


1601 

« • • « 

TCGTGCTCACCGGAGCAGCTTCTACCGGTGTCGGTGGACA 

1640 

10 

1641 

• • • 

AGTCTCCGTGAACATGCCACTCCAGAAGACTATGGAGATC 

■ • • ■ 

1680 


1681 

GGCGAGAACTTGACATCCAGGACCTTCAGATACACCGACT 

1720 


1721 

• • • • 

TCTCTAACCCTTTCAGTTTCCGTGCCAACCCTGACATCAT 

1760 

15 

1761 

* • • • 

TGGCATTAGCGAACAACCTCTCTTTGGAGCTGGTAGCATC 

1800 


1801 

“ • • • 

TCATCTGGCGAATTGTACATTGACAAGATTGAGATCATTC 

1840 


1841 

• • • • 

TTGCCGACGCTACCTTCGAGGCTGAGTCTGACCTTGAGAG 

1880 

20 

1881 

* 1 • 9 

AGCCCAGAAGGCTGTGAACGCCCTCTTTACCTCCTCTAAT 

1920 


1921 

1 • * o 

CAGATTGGCTTGAAAACTGACGTTACTGACTATCACATTG 

1960 

25 

1961 

• • • 

ACCAAGTGTCCAACTTGGTCGACTGCCTTAGCGATGAGTT 

2000 


2001 

• * - • 

CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 

2040 


2041 

• • • • 

CACGCCAAGCGTCTCAGCGACGAGAGGAATCTCTTGCAAG 

2080 

30 

2081 

ACCCCAACTTCAGAGGCATCAACAGGCAGCCAGACCGTGG 

2120 
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2121 

• • • • 

TTGGAGAGGAAGCACCGACATCACCATCCAAGGAGGCGAC 

2160 

5 

2161 

• * • i 

GATGTGTTCAAGGAGAACTACGTCACCCTCCCAGGAACTG 

2200 


2201 

• • • • 

TGGACGAGTGCTACCCTACCTACTTGTACCAGAAGATCGA 

2240 

10 

2241 

• • • • 

TGAGTCCAAACTCAAAGCCTACACCAGGTATGAACTTAGA 

• • • • 

2280 


2281 

GGCTACATCGAAGACAGCCAAGACCTTGAAATCTACCTCA 

2320 


2321 

* • • • 

TCAGGTACAATGCCAAGCACGAGATCGTGAATGTCCCAGG 

2360 

15 

2361 

• • • « 

TACTGGTTCCCTCTGGCCACTTTCTGCCCAAATGCCCATT 

2400 


2401 

• • • • 

GGGAAGTGTGGAGAGCCTAACAGATGCGCTCCACACCTTG 

2440 


2441 

• ■ • • 

AGTGGAATCCTGACTTGGACTGCTCCTGCAGGGATGGCGA 

2480 

20 

2481 

• * • ■ 

GAAGTGTGCCCACCATTCTCATCACTTCACCTTGGACATC 

2520 


2521 

• • • . 

GATGTGGGATGTACTGACCTGAATGAGGACCTCGGAGTCT 

2560 

25 

2561 

• • • • 

GGGTCATCTTCAAGATCAAGACCCAAGACGGACACGCAAG 

2600 


2601 

• 

ACTTGGCAACCTTGAGTTTCTCGAAGAGAAACCATTGCTC 

2640 


2641 

• • • • 

GGTGAAGCTCTCGCTCGTGTGAAGAGAGCAGAGAAGAAGT 

2680 

30 

2681 

* • • a 

GGAGGGACAAACGTGAGAAACTCCAACTCGAGACTAACAT 

2720 
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2721 

CGTTTACAAGGAGGCCAAAGAGTCCGTGGATGCTTTGTTC 

2760 

5 

2761 

• ■ • • 

GTGAACTCCCAATATGATAGGTTGCAAGTGGACACCAACA 

2800 


2801 

TCGCCATGATCCACGCTGCAGACAAACGTGTGGACAGGAT 

2840 

10 

2841 

• • • ’ • 

TCGTGAGGCTTACTTGCCTGAGTTGTCCGTGATCCCTGGT 

■ « • • 

2880 


2881 

GTGAACGCTGCCATCTTCGAGGAACTTGAGGGACGTATCT 

2920 


2921 

" • • • 

TTACCGCATACTCCTTGTACGATGCCAGAAACGTCATCAA 

2960 

15 

2961 

• • * • 

GAACGGTGACTTCAACAATGGCCTCTTGTGCTGGAATGTG 

3000 


3001 

• • • • 

AAAGGTCATGTGGACGTGGAGGAACAGAACAATCACCGTT 

3040 


3041 

• • • • 

CCGTCCTGGTTATCCCTGAGTGGGAAGCTGAAGTGTCCCA 

3080 

20 

3081 

• • * * 

AGAGGTTAGAGTCTGTCCAGGTAGAGGCTACATTCTCCGT 

3120 


3121 

• • • • 

GTGACCGCTTACAAGGAGGGATACGGTGAGGGTTGCGTGA 

3160 

25 

3161 

• • • • 

CCATCCACGAGATCGAGGACAACACCGACGAGCTTAAGTT 

3200 


3201 

• • • # 

CTCCAACTGCGTCGAGGAAGAAGTCTATCCCAACAACACC 

3240 


3241 

• » • • 

GTTACTTGCAACAACTACACTGGGACCCAGGAAGAGTACG 

3280 

30 

3281 

• • • • 

AAGGTACCTACACTAGCCGTAACCAAGGTTACGACGAAGC 

3320 
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3321 

• • ■ • 

TTACGGAAACAATCCTTCCGTTCCTGCTGACTATGCCTCC 

3360 

3361 

• • * • 

GTGTACGAGGAGAAATCCTACACAGATGGCAGACGTGAGA 

3400 

3401 

• • • • 

ACCCTTGCGAGTCCAACAGAGGTTACGGTGACTACACACC 

3440 

3441 

• * • • 

ACTTCCAGCAGGCTATGTTACCAAGGACCTTGAGTACTTT 

3480 

3481 

■ • • ■ 

CCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAAACCG 

3520 

3521 

• • • • 

AGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCTTGAT 

3560 

3561 

GGAGGAA 3567. 


24. A structural gene of Claim 13 which encodes 
P2 insecticidal protein having the sequence: 

1 

* • * • 

ATGGACAACAACGTCTTGAACTCTGGTAGAACAACCATCT 

40 

41 

* • • » 

GCGACGCATACAACGTCGTGGCTCACGATCCATTCAGCTT 

80 

81 

• • • I 

CGAACACAAGAGCCTCGACACTATTCAGAAGGAGTGGATG 

120 

121 

• • • • 

GAATGGAAACGTACTGACCACTCTCTCTACGTCGCACCTG 

160 

161 

• ■ • • 

TGGTTGGAACAGTGTCCAGCTTCCTTCTCAAGAAGGTCGG 

200 

201 

• • • • 

CTCTCTCATCGGAAAACGTATCTTGTCCGAACTCTGGGGT 

240 


30 
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' 241 

• • • ■ 

ATCATCTTTCCATCTGGGTCCACTAATCTCATGCAAGACA 

280 

5 

281 

• ■ • a 

TCTTGAGGGAGACCGAACAGTTTCTCAACCAGCGTCTCAA 

320 


321 

• • * t 

CACTGATACCTTGGCTAGAGTCAACGCTGAGTTGATCGGT 

360 

10 

361 

* • * • 

CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGACA 

• ■ • > 

400 


401 

ACTTCTTGAATCCAACTCAGAATCCTGTGCCTCTTTCCAT 

440 


441 

* • • • 

CACTTCTTCCGTGAACACTATGCAGCAACTCTTCCTCAAC 

480 

15 

481 

• • • "a 

AGATTGCCTCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 

520 


521 

• • t ♦ 

TTCTTCCACTCTTTGCTCAGGCTGCCAACATGCACTTGTC 

560 


561 

• • ■ • 

CTTCATACGTGACGTGATCCTCAACGCTGACGAATGGGGA 

600 

20 

601 

# • 

ATCTCTGCAGCCACTCTTAGGACATACAGAGACTACTTGA 

: 640 


641 

• * • « 

GGAACTACACTCGTGATTACTCCAACTATTGCATCAACAC 

680 

25 

681 

• • • , 

TTATCAGACTGCCTTTCGTGGACTCAATACTAGGCTTCAC 

720 


721 

• • • a 

GACATGCTTGAGTTCAGGACCTACATGTTCCTTAACGTGT 

760 - 


761 

* • • t 

TTGAGTACGTCAGCATTTGGAGTCTCTTCAAGTACCAGAG 

800 

30 

801 

* 

CTTGATGGTGTCCTCTGGAGCCAATCTCTACGCCTCTGGC 

840 
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AGTGGACCACAGCAAACTCAGAGCTTCACAGCTCAGAACT 

880 

5 

881 

• « § • 

GGCCATTCTTGTATAGCTTGTTCCAAGTCAACTCCAACTA 

920 


921 

* * * # 

CATTCTCAGTGGTATCTCTGGGACCAGACTCTCCATAACC 

960 


961 

TTTCCCAACATTGGTGGACTTCCAGGCTCCACTACAACCC 

1000 

10 

1001 

• • • • 

ATAGCCTTAACTCTGCCAGAGTGAACTACAGTGGAGGTGT 

1040 


1041 

• • • • 

CAGCTCTGGATTGATTGGTGCAACTAACTTGAACCACAAC 

1080 

15 

1081 

• • « • 

TTCAATTGCTCCACCGTCTTGCCACCTCTGAGCACACCGT 

1120 


1121 

* * ♦ • 

TTGTGAGGTCCTGGCTTGACAGCGGTACTGATCGCGAAGG 

1160 


1161 

• • ■ • 

AGTTGCTACCTCTACAAACTGGCAAACCGAGTCCTTCCAA 

1200 

20 

1201 

• « • a 

ACCACTCTTAGCCTTCGGTGTGGAGCTTTCTCTGCACGTG 

1240 


1241 

• • • • 

GGAATTCAAACTACTTTCCAGACTACTTCATTAGGAACAT 

1280 


1281 

• * * • 

CTCTGGTGTTCCTCTCGTCATCAGGAATGAAGACCTCACC 

1320 

25 

1321 

CGTCCACTTCATTACAACCAGATTAGGAACATCGAGTCTC 

1360 


1361 

• • • • 

CATCCGGTACTCCAGGAGGTGCAAGAGCTTACCTCGTGTC 

1400 

30 

1401 

TGTCCATAACAGGAAGAACAACATCTACGCTGCCAACGAG 

1440 
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1441 

• • • • 

AATGGCACCATGATTCACCTTGCACCAGAAGATTACACTG 

1480 

5 

1481 

GATTCACCATCTCTCCAATCCATGCTACCCAAGTGAACAA 

1520 


1521 

• • • • 

TCAGACACGCACCTTCATCTCCGAAAAGTTCGGAAATCAA 

1560 

10 

1561 

• • • 

GGTGACTCCTTGAGGTTCGAGCAATCCAACACTACCGCTA 

* • • • 

1600 


1601 

GGTACACTTTGAGAGGCAATGGAAACAGCTACAACCTTTA 

1640 


1641 

• • • • 

CTTGAGAGTTAGCTCCATTGGTAACTCCACCATCCGTGTT 

1680 

15 

1681 

ACCATCAACGGACGTGTTTACACAGTCTCTAATGTGAACA 

1720 


1721 

• • • • 

CTACAACGAACAATGATGGCGTTAACGACAACGGAGCCAG 

1760 


1761 

• • • 

ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 

1800 

20 





1801 

AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 

1840 


1841 

* * • • 

CTGGAACTCCATTTGATCTCATGAACATCATGTTTGTGCC 

1880 

25 

1881 

• * 

AACTAACCTCCCTCCATTGTACTAA 1905. 



25 

. A plant transformation vector comprising a 


plant 

gene containing a structural gene of Claim 

13. 


30 
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26. A structural gene sequence of Claim 13 
encoding a fusion protein comprising the N-terminal 
610 amino acids of B.t.k. HD-1 and the C-terminal 567 
amino acids of B.t.k. HD-73, said gene having the 
sequence: 



1 

• • • • 

ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 

40 

10 

41 

• • * • 

ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 

80 


81 

ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 

120 


121 

• • • • 

TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

160 

15 

161 

* • * * 

CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 

200 


201 

♦ ♦ • • 

CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 

240 

20 

241 

* * * ■ 

GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

280 


281 

ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

320 


321 

• » • • 

CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

360 

25 

361 

• • * • 

CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

400 


401 

• • • * 

TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

440 


30 
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'441 

• • • * 

GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

480 

5 

481 

• • • • 

TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 

520 


521 

• • • • 

ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

560 

10 

561 

• • • * 

AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

• * • • 

600 


601 

GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

640 


641 

* « « « 

GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

680 

15 

681 

• • • • 

TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

720 


721 

• • • • 

TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 

760 


761 

• • « . 

CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

800 

20 

801 

• • • • 

CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

840 


841 

• * • • 

CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

880 

25 

881 

• • • • 

CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

920 


921 

• • • . 

CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 

960 


961 

• • • • 

ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 

1000 

30 

1001 

• • • • 

CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 

1040 



WO 90/10076 


1041 

5 

1081 

1121 

1161 

10 

1201 

1241 

15 1281 
1321 

1361 

20 

1401 

1441 

1481 

25 

1521 

1561 
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• • • • 

ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 
• • • * 

ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 
• • • * 

GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 
• • • * 

GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 
* • • * 

TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • t 

CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• • • • 

AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 
• • • « 

C ATGGATTC ATCGTAGTGCTGAGTTCAACAATATCATTCC 14 00 

• * . 

TTCCTCTCAAATCACCCAAATCCCATTGACC AAGTCTACT 1440 

• • • • 

AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 
* • • • 

TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 
• * • • 

GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 1560 
• « « • 

CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 
* • • • 

ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 


30 1601 


1640 
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r 

1641 

• • • • 

TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 

1680 

5 

1681 

• • • « 

TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 

1720 


1721 

• • • • 

CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 

1760 

10 

1761 

• • • * 

CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 

• • • * 

1800 


1801 

CGTATTGAGTTTGTGCCTGCCGAAGTTACCCTCGAGGCTG 

1840 


1841 

• • • . 

AGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGCCCT 

1880 

15 

1881 

* • • • 

CTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAACGTT 

1920 


1921 

• - ■ • • 

ACTGACTATCACATTGACCAAGTGTCCAACTTGGTCACCT 

1960 


1961 

• » * • 

ACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACT 

2000 

20 

2001 

• • • • 

CTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGACGAG 

2040 


2041 

• • * 

AGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCAACA 

2080 

25 

2081 

• • . . 

GGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGATCAC 

2120 


2121 

• 

CATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTACGTC 

2160 

- 

2161 

• • • . 

ACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCTACT 

2200 

30 

2201 

• • * • 

TGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTTCAC 

2240 
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■ • • a 

2241 CAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAAGAC 2280 

5 ' ' 

2281 CTTGAAATCTACTCGATCAGGTACAATGCCAAGCACGAGA 2320 

* • • • 

2321 CCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACTTTC 2360 

• • a . 

2361 TGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAACAGA 2400 

10 

• • a a 

2401 TGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACTGCT 2440 

• • * a 

2441 CCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCATCA 2480 

■ * • a 

15 2481 CTTCTCCTTGGACATCGATGTGGGATGTACTGACCTGAAT 2520 

• • a a 

2521 GAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGACCC 2560 

• a a a 

2561 AAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCTCGA 2600 

20 .... 

2601 AGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTGAAG 2640 

• • a a 

2641 AGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAACTCG 2680 

• a a . 

2681 AATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGAGTC 2720 

25 

■ • a a 

2721 CGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAGTTG 2760 

• • a a 

2761 CAAGCCGACACCAACATCGCCATGATCCACGCCGCAGACA 2800 

• * a . 

30 2801 AACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGAGTT 2840 
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2841 

GTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAGGAA 

2880 

5 

2881 

• • • • 

CTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACGATG 

2920 . 


2921 

* ■ ■ * 

CCAGAAACGTCATCAAGAACGGTGACTTCAACAATGGCCT 

2960 


2961 

• • • • 

CAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAGGAA 

3000 

10 


• ♦ » ♦ 



3001 

CAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGTGGG 

3040 


3041 

• • • • 

AAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGGTAG 

3080 

15 

3081 

• • « t 

AGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGATAC 

3120 


3121 

• • • • 

GGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACAACA 

3160 


3161 

• • * • 

CCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGAAAT 

3200 

20 

3201 

• • • « 

CTATCCCAACAACACCGTTACTTGCAACGACTACACTGTG 

3240 


3241 

* * • . 

AATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTAACA 

3280 

25 

3281 

• • * • 

GAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTATGC 

• • • • 

3320 


3321 

CTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGACGT 

3360 


3361 

• • * • 

GAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACTACA 

3400 

30 

3401 

• ■ • • 

CACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGAGTA 

3440 
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3441 CTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAA 3480 

• • • • 
e 

3481 ACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCT 3520 

3521 TGATGGAGGAA 3531. 

27. A method of Claim 4 further comprising removal 
10 of sequences comprising more than five consecutive A+T 

or G+C bases. 

28. A structural gene sequence of Claim 13 
comprising a majority of plant preferred codons. 

29. A structural gene encoding the coat protein of 
25 potato leaf roll virus, said gene having the sequence: 


1 

ATGAGTACTGTCGTGGTTAAGGGAAACGTGAACGGTGGTG 

40 

4'1 

• • • • 

TTCAACAACCTAGAAGGAGAAGAAGGCAATCCCTTCGTAG 

80 

81 

• • * * 

GAGAGCTAACAGAGTTCAGCCAGTGGTTATGGTCACTGCT 

120 

121 

* • • ■ 

CCTGGGCAACCAAGAAGGAGAAGAAGGAGAAGAGGAGGTA 

160 

161 

• « • • 

ATCGCAGATCAAGAAGAACTGGAGTTCCCAGAGGAAGAGG 

200 

201 

• • • • 

TTCAAGCGAGACATTCGTGTTTACAAAGGACAACCTCGTG 

240 

241 

* • • * 

GGCAACTCCCAAGGAAGTTTCACCTTCGGACCAAGTGTTT 

280 

281 

• • • • 

CAGACTGTCCAGCATTCAAGGATGGAATACTCAAGGCTTA 

320 


30 
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321 

CCATGAGTACAAGATCACAAGTATCTTGCTTCAGTTCGTC 

360 

5 

361 

• • * ■ 

AGCGAGGCCTCTTCCACCTCTCCAGGCTCCATCGCTTATG 

400 


401 

• ■ • * 

AGTTAGATCCACATTGCAAAGTTTCATCCCTCCAGTCCTA 

440 

10 

441 

• • • ■ 

C GT CAACAAGTT C CAAATCACAAAG GGTGG TGCTAAGAC C 

* ■ • • 

480 


481 

TATCAAGCTCGTATGATCAACGGAGTTGAATGGCACGATT 

520 


521 

• • ■ » 

CTTCTGAGGATCAGTGCAGAATCCTTTGGAAAGGAAATGG 

560 

15 

561 

• • * • 

AAAGTCTTCAGATCCAGCTGGATCTTTCAGAGTTACCATC 

600 


601 

AGAGTTGCTCTTCAAAACCCAAAG 624. 



30. A chimeric plant gene which comprises a 
structural coding sequence encoding an insecticidal 
protein of Bacillus thuringiensis, said structural 
coding sequence being modified to reduce the number of 
putative polyadenylation signals within said 
structural coding sequence. 

31. A chimeric plant gene of Claim 30 in which the 
polyadenylation signals are selected from the group 
consisting of AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, 
AAAATA, ATTAAA, AATTAA, AATACA and CATAAA. 

30 


# 


20 


25 
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32. A chimeric plant gene of Claim 31 in which 
said structural coding sequence is further modified to 
reduce the number of ATTTA sequences within said 

® structural coding sequence. 

33. A chimeric plant gene of Claim 32 in which 
said structural coding sequence is substantially 
devoid of polyadenylation signals and ATTTA sequences. 

34. A transformed plant cell containing a gene of 

10 Claim 33. 

35. A transformed plant cell of Claim 34 selected 
from the group consisting of soybean, cotton, alfalfa, 
oilseed rape, flax, tomato, sugarbeet, sunflower, 
potato, tobacco, maize, rice and wheat. 

25 36. A plant comprising transformed plant cells of 

Claim 34. 

37. A plant of Claim 36 which comprises plant 
cells of Claim 35. 

38. A seed produced by a plant of Claim 36. 

20 


25 


30 
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1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 40 

* * * • 

41 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 80 

• * • # 

81 TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA 120 

T C 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 

161 TTGAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 200 

C C C G C G 

201 GAACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240 

T 

• • * * 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 280 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 

* * * • 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 360 

• * • * 

361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 400 

CC C C 

4 • • # 

401 TATATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAG 440 

G C C CC C CC C 

441 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 480 

• • • . 

481 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 

521 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 560 

561 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

• • • . 

601 ATAAGATATAATCAATTTAGAAGAGAATTAACACTAACTG 640 

C G C C G C GC T 

• • • . 

641 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 680 

681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 

FIG. 2A 
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721 

ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 

760 

761 

• « ■ V 

TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAG 

800 

801 

• • • • 

TCCACATTTGATGGATATACTTAATAGTATAACCATCTAT 

840 

841 

ACGGATGCTCATAGAGGAGAATATTATTGGTCAGGGCATC 

C C C T C 

880 

881 

AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 

G C 

920 

921 

• « 1 * 

CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 

960 

961 

CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 

1000 

1001 

* * * * 

GAACATTATCGTCCACCTTATATAGAAGACCTTTTAATAT 

C 

1040 

1041 

• • • • 

AGGGATAAATAATCAACAACTATCTGTTCTTGACGGGACA 

C C C C 

1080 

1081 

* * • • 
GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 

1120 

1121 

* • 1 t 

TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 

1160 

1161 

ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 

1200 

1201 

AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 

1240 

1241 

’ • • • 
TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 

1280 

1281 

■ • • . 
CTCTTGGATACATCGTAGTGCTGAATTTAATAATATAATT 

G C C C C C 

1320 

1321 

CCTTCATCACAAATTACACAAATACCTTTAACAAAATCTA 

C C C AC C C G 

1360 

1361 

* * • • 
CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 

1400 


FIG. 2B 
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1401 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 1440 

• « • • 

1441 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 1480 

• ■ * t 

1481 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

« • * • 

1521 AAATTTACAATTCCATACATCAATTGACGGAAGACCTATT 1560 

CC T G C 

1561 AATCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTA 1600 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

* • ■ * 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 

1721 ATCGAATTGAATTTGTTCCGGCA 1743 

FIG.2C 
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• • • • 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

0 t • • 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G AT C T 

• • • • 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CT GAG GCCCGCGA 

• * * * 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
GCTCC CCC T 

* • * ■ 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

• • • • 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

■ » • • 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 

C TC CC C G A 

• • • * 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 

C CTGCA CAT 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 

GC CGCC CGCG 

* • I * 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 

C A T C T CC CAGC GC TC 

* • I * 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 

C AGC G C T 

* • • • 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

• • • • 

601 GGCAACTATACAGATcATGCTGTaCGCTGGTACAATACGG 640 

A CCCCC TT CT 

• • • • 

641 GATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGGAT 680 

C G C T T 

FIG. 3 A 
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681 

721 

761 

801 

841 

881 

921 

961 

1001 

1041 

1081 

1121 

1161 

1201 

1241 

1281 

1321 

1361 
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AAGATATAATCAATTTAGAAGAGAATTAACACTAACTGTA 
T CCGCG GCCAT 

TTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAGAA 
G C T G C C CTCC 

• • • • 

CGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 
CCTCT G CTC 

• * • • 

TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 
C T TCTGCCC CC 

• • - • 

CGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAGTC 
T T T C A T C CTCC C C 


CACATTTGATGGATATACTTAATAGTATAACCATCTATAC 
C CCTGCC T C 

• • • • 
GGATGCTCATAGAGGAGAATATTATTGGTCAGGGCATCAA 
C C G C TACG 

ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 
C C A T A CAGC C G T 

1 ■ 
CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 
CTC C C 


ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 
C T C C 

• • • • 
ACATTATCGTCCACCTTATATAGAAGACCTTTTAATATAG 
CGT GC CC C 

«... 
GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 
TCCCG TC A 

* * • * 

ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 
G C C T T C T 

• • • 

TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 
G C T CT C C 

CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 
A C T C CTC 

* • • • 

TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 
CCAGG CGC ‘C CAC 

AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 
C C TCC G C C C 


CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTCC 
AT G C C C 


FIG. 3B 
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• • • • 

1401 TTCATCACAAATTACACAAATACCTTTAACAAAATCTACT 1440 

CT CC CAGCG 

lit* 

1441 AATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGGAT 1480 

C A G C 

• • ■ • 

1481 TTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGCCA 1520 

C T A T 

• • • • 

1521 GATTTCAACCTTAAGAGTAAATATTACTGCACCATTATCA 1560 

AGC CC TCC CTT 

1561 CAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCACAA 1600 

T C G T A A 

1601 ATTTACAATTCCATACATCAATTGACGGAAGACCTATTAA 1640 

CG CCCC G C 

* • • • 

1641 TCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTAAT 1680 

T C C C C TCA CCCC 
• • * . 

1681 TTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTACTC 1720 

GA C CACC C 

• • « * 

1721 CGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTAAG 1760 

TC CTC CTCCCT 

• • • • 

1761 TGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAGAT 1800 

C G T G C T C 

* • • * 

1801 CGAATTGAATTTGTTCCGGCAGAAGTAACCTTTGAGGCAG 1840 

T G GTC T C T 

1841 AATAT 1845 

G C 

FIG. 3C 
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t ■ i • 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 

CCA C AC 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 

C C G A T C T 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 

CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 

CT GAG GCCCGCGA 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 

GCTCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 

C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 

G GC GGC G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 

G C G G T G C 

• • • • 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 

C C T GAGC C C 

* • * * 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 

C TC CC C G A 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 

C CTGCA CAT 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 

GC CGCC CGCG 

• • < t 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 

C A T C T CC CAGC GC TC 

• • • . 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 

C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

* * • , 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 

A CCCCC TT CT 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 

C G G C T T A 


FIG. 4A 
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681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 

TACCGCG GCCAT 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 760 

G C T GT C C CTCC 

• • * * 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 

CCCTCT G CTC 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 

C T TCTGCCC CC 

t • « I 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 

TTTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 

C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 

C CAAGG C TACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 

C C A T A CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 

CTC. C C 

• • • • 

1041 ACAACGTATTGtTGCTC'AACTAGGTCAGGGCGTGTATAGA 1080 

C T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 

CGT CGC CC C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 

TCCCG TC A 

• * * * 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 

G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 

G C T CT C C 

* * * # 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 

A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 

CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 

C C TCC G C C C 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 

C G C C C C C 


FIG.4B 

SUBSTITUTE SHEET 



WO 90/10076 PCT/US90/00778 



11/46 


1401 

• • • • 
ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

C 

1440 

1441 

TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 

C C C C c 

1480 

1481 

CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 

A C C C C C C 

1520 

1521 

CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

1560 

1561 

CC ATCGAC ATCTAC CAGATATCGAGTTCGTGT ACGGTATG 

C A GA 

1600 

1601 

CTTCTGTAACCCC GATTCAC CTCAAC GTTAATTGGGGTAA 

G T 

1640 

1641 

TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 

C C T C 

1680 

1681 

TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 

C G C C C C C 

1720 

1721 

AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 

c c c C 

1760 

1761 

* • 

AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 

G C T C 

1800 

1801 

* * . , 
GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 

C G C 

1840 

1841 

CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 

A TGCG 

1880 

1881 

GCTGTTT ACGTCTACAAACC AACTAGGGCTAAAAACAAAT 
CTGT ACGTCTACA C AGCT G ACTC G CA TG 

1920 

1921 

G 1921 
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1 

GAAAGAATAGAAACTGGTTACACCCCAATCGATATTTCCT 
ATGGCC T C T C C C 

40 

41 

TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 
CTGAG GCCC GCGA 

80 

81 

TGCTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGA 
GCTCC CCC T 

120 

121 

t • • • 

atttttggtccctctcaatgggacgcatttcttgtacaaa 

C A T C G G 

160 

161 

TTGAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 

G GC GGC G C 

200 

201 

GAACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 

G C G G T G C 

240 

241 

• • • • 

TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 

C C T GAGC C C 

280 

281 

ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 

C TC CC C G A 

320 

321 

• • • • 

ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 

C CTGCA CA 

360 

361 

* * • • 

CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 
TG C CGCC CGC 

400 

401 

9 • • * 

TAT ATGTTC AAG CTGC AAATTT AC ATTT ATCAGTTTTG AG 

G C A T C T CC CAGC GC TC 

440 

441 

• • • • 

AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 

C AGC G C T 

480 

481 

• • * . 

GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 
AC C CCCCT G 

520 

521 

• • ■ . 

TTGGCAACTATACAGATTATGCTGTACGCTGGTACAATAC 

A CCCCC TT C 

560 

561 

• • • a 

GGGATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGG 

T C G G C T T 

600 

601 

• • • * 
GTAAGGTATAATCAATTTAGAAGAGAATTAACACTAACTG 
ATACCGCG GCCA 

640 

641 

* • • a 

TATTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAG 

680 


T G C T GT C C CTCC 

FIG.8A 
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681 

• • * • 

AAGATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 
CCCTCT G CTC 

720 

721 

• • • • 

ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 

C T TCTGCCC C 

7 60 

761 

• • * • 

TTCGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAG 
CTTTCATC G CTCC C 

800 

801 

• • * • 

TCCACATTTGATGGATATACTTAACAGTATAACCATCTAT 
C C CCTG C T C 

840 

841 

• • • • 

ACGGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATC 
C CAAGG C TAC 

880 

881 

• • • • 

AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 
G C C A T A CAGC C G 

920 

921 

CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 
T C T C C C 

960 

961 

* • ■ a 

CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 
C T C C 

1000 

1001 

GAACATTATCGTCCACTTTATATAGAAGACCTTTTAATAT 
CGT CGC CC 

1040 

1041 

• • • . 

AGGGATAAATAATCAACAACTATCTGTTCTTGACGGGACA 
CTCCCG TC A 

1080 

1081 

• a a • 

GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 
G C C T T C 

1120 

1121 

• • a a 

TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 
T G C T CT C 

1160 

1161 

ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 
C A C T C C 

1200 

1201 

• * a a 

AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 
TCC CAGG CGC C CA 

1240 

1241 

• • a a 

TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 
C C C TCC G C C C 

1280 

1281 

• • a a 

CTCTTGGATACATCGTAGTGCTGAATTTAATAATATAATT 
C G C C C C C 

1320 

1321 

* • ■ a 

GCATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 

C 

1360 

1361 

* • • a 

ACTTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATT 
C C C C 

1400 


FI6.8B 
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1401 

TACTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAAT 

C ACC CCCC 

1440 

1441 

AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 

1480 

1481 

• • * * 

TCCCATCGACATCTACCAGATATCGAGTTCGTGTACGGTA 

C A GA 

1520 

1521 

* * • # 

TGCTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGT 

G T 

1560 

1561 

• • • * 

AATTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTA 

C C T 

1600 

1601 

• • • i 

CGTCATTAGATAATCTACAATCAAGTGATTTTGGTTATTT 
CCG C CC C C 

1640 

1641 

• • • • 

TGAAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATA 

C C C C 

1680 

1681 

GTAGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAA 

G C T 

1720 

1721 

TAGACAGATTTGAATTTATTCCAGTTACTGCAACACTCGA 

C C G C 

1760 

1761 

GGCTGAA 1767 



6 


FIG. 8C 


SUBSTITUTE SHEET 



WO 90/10076 


PCT/US90/00778 


18/46 

• • • • 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

• t » * 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 

C C G A T C T 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 

CCT C TC CC 

i i * • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 

CT GAG GCCCGCGA 

* • * • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 

GCTCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 

C A T C G G 

* » • » 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 

G GC GGC G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 

G C G G T G C 

• * • • 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 

C C T GAGC C C 

• • • • 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 

C TC CC C G A 
• • • • 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 

C CTGCA CAT 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 

GC CGCC CGCG 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 

C A T C T CC CAGC GC TC 

• 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 

C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 

A CCCCC TT CT 

• * • . 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 

C G G C T T A 

FIG.9A 
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681 

• • • • 

AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 
TACCGCG GCCAT 

720 

721 

• • * « 

TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 

G C T GT C C CTCC 

760 

761 

* • • 1 

GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 
CCCTCT G CTC 

800 

801 

• • t i 

TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 

C T TCTGCCC CC 

840 

841 

• • • • 

CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 
TTTCATC G CTCC C C 

880 

881 

• » * * 

CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 

C C CT G C T C 

920 

921 

* • • • 

GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 

C CAAGG C TACG 

960 

961 

• • * • 

ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 

C C A T A CAGC C G T 

1000 

1001 

• • ■ a 

CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 
CTC C C 

1040 

1041 

ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 

C T C C 

1080 

1081 

• • • • 
ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 
CGT CGC CC C 

1120 

1121 

GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 
TCCCG TC A 

1160 

1161 

* 

ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 

G C C T T C T 

1200 

1201 

* • • • 
TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 

G C T CT C C 

1240 

1241 

* • • • 
CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 

A C T C CTC 

1280 

1281 

ft 

• • * • 
TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 
CCAGG CGC C CAC 

1320 

1321 

• • a . 

AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 

C C TCC G C C C 

1360 

1361 

• • • , 
CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 

C G C C C C C 

1400 


FIG.9B 
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1401 

ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

C 

1440 

1441 

• • * • 

TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 

C C C C C 

1480 

1481 

* « • • 

CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 

A C C C C C C 

1520 

1521 

CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

1560 

1561 

* f • • 

CCATCGACATCTACCAGA.TATCGAGTTCGTGTACGGTATG 

C A GA 

1600 

1601 

CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 

G T 

1640 

1641 

TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 

C C T C 

1680 

1681 

TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 

C G C C C C C 

1720 

1721 

AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 

C C C C 

1760 

1761 

AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 

G C T C 

1800 

1801 

GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 

C G C 

1840 

1841 

CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 

1880 

1881 

* * t * 

GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 

1920 

1921 

GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 

1960 

1961 

CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 

2000 

2001 

• t • I 

ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 

2040 

2041 

• * • 1 

GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 

2080 

2081 

ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 

2120 


FI6.9C 
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2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

• • • • 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 

» • * * 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

* • • « • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

• * • • 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

* # * ■ 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

• • • • 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

• « > fl 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 

• • • • 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

* • • • 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 

• • * • 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

• • • » 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 
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2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

• • • s 

2921 ATGCG AGAAATGTCATT AAAAATGGTG ATTTTAATAATGG 2960 

• • • * 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

• • * • 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 

• • • 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

* • * . 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

* ' • * 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

• • • • 

3441 ATACTTCCCAG AAACCG ATAAGGTATGG ATTG AGATTGGA 3480 

• • • • 

3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534 

FI6.9E 
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* * • • 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 

CCA C AC 

• ' • I • 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 

C C G A T C T 

* • * * 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 

CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 

CTGAG GCCCGCGA 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 

GCTCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 

C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 

G GC GGC G C 

• • • • 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 

G C G G T G C 

• • • ■ 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 

C C T GAGC C C 

• • • • 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 

C TC CC C G A 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 

C CTGCA CAT 

■ • • • 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 

GC CGCC CGCG 

• * • • 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 

C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 

C AGC G C T 

* • • * 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 

A CC.CCC TT CT 
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641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 

• * * * 

681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 

TACCGCG GCCAT 

• * • • 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 760 

G C T GT C C CTCC 

• * • * 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 

CCCTCT G CTC 

• • • * 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 

C T TCTGCCC CC 

I • • • 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 

TTTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 

C C CT G C T C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 

C CAAGG C TACG 

• * * I 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 

C C A T A CAGC C G T 

• * • • 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 

CTC C C 

■ • • • 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 

C T C C 

• • • • 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1i2 0 

CGT CGC CC C 

• • ■ a 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 

TCCCG TC A 

• • • • 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 

G C C T T C T 

• • i • 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 

G C T CT C C 

* * • » 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 

A C T C CTC 

• • • • 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 

CCAGG CGC C CAC 

• • . . 

1321 AGTAAT AGTAGTGTAAGTAT AAT AAGAGCTCCTATGTTCT 1360 

C C TCC G C C C 
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1361 

CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 

C G C C C C C 

1400 

1401 

• • • • 

ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 

C 

1440 

1441 

• 1 » • 

TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 

C C C C C 

1480 

1481 

CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 

A C C C C C C 

1520 

1521 

CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 

1560 

1561 

CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 

C A GA 

1600 

1601 

CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 

G T 

* * * « 

1640 

1641 

TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 

C C T C 

1680 

1681 

TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 

C G C C C C C 

1720 

1721 

* t « fl 

AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 

C C C C 

1760 

1761 

AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 

G C T C 

1800 

1801 

GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 

C G C 

1840 

1841 

CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 

1880 

1881 

• • * • 

GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 

G C C C G C 

1920 

1921 

• * * a 

GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 

G C G G 

I960 

1961 

* * * , 

CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 

C CC CAGC G C 

2000 

2001 

ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 

2040 

2041 

• « • v 

G AACG C AATTT ACTCC AAGATTC AAATTT C AAAGAC ATT A 

2080 
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2081 

ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 

2120 

2121 

* * * * 

TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 

G TC GCGGC 

2160 

2161 

• • • * 

GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 

2200 

2201 

• • • * 

ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 
CCCCGG CGCGG 

2240 

2241 

• • • t 

TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 

2280 

2281 

GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 

C C G CC C C 

/ 

2320 

2321 

AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 

2360 

2361 

TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 

2400 

2401 

CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 

2440 

2441 

• m * m 

GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 

2480 

2481 

• * • * 

TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 

2520 

2521 

AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 

2560 

2561 

« • • • 

CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 

2600 

2601 

* • * * 

CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 

2640 

2641 

AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 

G G 

2680 

2681 

TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 

G C C C C 

2720 

2721 

• • * • 

ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 

2760 

2761 

* • * I 

TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 

2800 


FIG. 10D 
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2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

C C 

• 4 ■ • 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 

C C CGC CCC 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• f • • 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

3121 TATGGAGAAG GTTGCGTAAC CATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

• • • • 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534. 

FIG. fOE 
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* • • * 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 

CCA C AC 

■ < t t 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 

C C G A T C T 

• • • ■ 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 

CCT C TC CC 

• • • • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 

CTGAG GCCCGCGA 

• • • ■ 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 

GCTCC CCC T 

• • * • 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 

C A T C G G 

« • * • 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 

G GC GGC G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 

G C G G T G C 

• • • * 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 

C C T GAGC C C 

* • • * 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 

C TC CC C G A 
• • • * 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 440 

C CTGC A CAT 

• ■ * • 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 

GC CGCC CGCG 

• ■ . . 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 

C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 

C AGC G C T 

• • • • 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 

AC C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 

A CCCCC TT CT 

# • ■ ■ 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 

C G G C T T A 


FIG. 11A 


SUBSTITUTE SHEET 



WO 90/10076 


PCT/US90/00778 


681 

721 

761 

801 

841 

881 

921 

961 

1001 

1041 

1081 

1121 

1161 

1201 

1241 

1281 

1321 

1361 


29/46 


AAGGTATAATC AATTTAGAAGAGAATTAACACTAACTGTA 720 

TACCGCG GCCAT 

TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 

CCCTCT G CTC 


TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 
C T TCTGCCC CC 

CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 
TTTCATC G CTCC C C 
• • • • 

CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 
C C CT G C T C 

■ • • • 

GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 


C A AG G 


C G 


ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 
C C A T A CAGC C G T 

* • • 

CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 

CTC C C 

• • • • 

ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 
C T C C 

* • • • 

ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 
CGT CGC CC C 

* • • * 

GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 
TCCCG TC A 

ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 


TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 
G C T CT C C 

CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 
A C T C CTC 

TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 
CCAGG CGC C CAC 

AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 
C C TCC G C C C 

CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 
C G C C C C C 
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• • * • 
1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 
C 

• • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 
C C C C C 

i * « • 

1481 CTGGTGGGGACTTAGTTAG ATTAAATAGTAGTGGAAATAA 
A C C C C C C 

• • • • 

1521 CATTCAGAATAG AGGGTATATTGAAGTTCC AATTC ACTTC 


1561 CCATCG ACATCT ACCAGATATCGAGTTCGTGTACGGTATG 
C A GA 

• • ■ • 
1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 
G T 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 
C C T C 

1681 TCATTAG ATAATCTAC AATC AAGTGATTTTGGTTATTTTG 
C G C C C C C 

1721 AAAGTGCCAATGCTTTTAC ATCTTCATTAGGTAATATAGT 

C C C C 

17 61 AGGTGTTAGAAATTTTAGTGGG ACTGC AGGAGTGAT AATA 
G C T C 

1801 GAC AGATTTGAATTTATTCCAGTT ACTGC AACACTCGAGG 
C G C 

* • 4 * 

1841 CTGAAT ATAATCTGGAAAGAGCGC AGAAGGCGGTGAATGC 
GCCTG C T C 

• • • • 

1881 GCTGTTTACGTCTACAAACCAACT AGGGCTAAAAACAAAT 

CC CCCTGTCTG TC 
• • • • 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 
TTC C C CGC 

• • • a 

1961 CGTATTTATCGG ATGAATTTTGTCTGGATGAAAAGCGAGA 
C CC TAGC G C C C C G T 

* • • • 

2001 ATTGTCCG AGAAAGTCAAACATGCGAAGCGACTC AGTGAT 

CC T CC T CC 

• • • • 

2041 GAACGC AATTT ACTCC AAG ATTC AAATTTC AAAG AC ATT A 
GAGCCTG CCC C 

• • * • 

2081 ATAGGC AACCAG AACGTGGGTGGGGCGG AAGT AC AGGG AT 
C G T T C C 

FIG. 11C 
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• • • • 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
C CCTGCGGC 

• * • ■ 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 
CCCATCC CTC 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 
C CGG GCCC 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 
CAG CT CC CC 

2281 GACTTAGAAATC TATTTAATTCGCTACAAT GCAAAACATG 2320 
CT CCGCAG CGC 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 
GCG C T CC A 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 
T TC C T G T C 

< • • • 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 
AT G G C 

* • • • 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 
C C C C G C T 

• • » 1 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 
C GCG T C G 

• • * I 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

C A C C C C 

■ • m « 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 
C C A T C C T 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

G C T T C 

• • * , 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 
G A G G G G C 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 
C T C CGC 

• • « • 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 
GCG GCG C G 

• • • • 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
G CCCCC CCC 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 
C G C T G CT 

FIG. 11D 
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» « • • 

GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 
T C CT GCTCCCG 

« • « i 

GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 
CTGA CTC TGC 

• • • • 

ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 
C C CGC CCC 


CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 
C CAG T T G C G G 


GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 
G TG C G GTG 


GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 
T C G A A A 

TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 
AA CTC GCT 

TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 
C T G G C C 

ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 
C C G T CTC C G A 

• • • • 

AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 
CC CTTCCCC 

« • • I 

GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 
G G G C AGC 


ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 
CA T C T T C 

TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 
CCGCGG CC CA 

• • * * 

AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 
CT C CGC T C C 

• • • . 

ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 
A T C TCGGCT 

* * • • 

ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 
G TTG CAG C CT 


GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 


C C 


GC T 


TCCTTATGGAGGAA 3534 
T G 


FIG. 11E 
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1 ATGACTGCAGATAATAATACGGAAGCACTAGATAGCTCTA 40 

CCCC CCCT 
* • ■ ■ 

41 CAACAAAAGATGTCATTCAAAAAGGCATTTCCGTAGTAGG 80 

CTG TCGGTC TG 

• • • « 

81 TGATCTCCTAGGCGTAGTAGGTTTCCCGTTTGGTGGAGCG 120 

AC T G GTATCC C 

121 CTTGTTTCGTTTTATAC AAACTTTTTAAAT ACTATTTGGC 160 

C GAGC C CCCC 

• • • • 

161 CAAGTGAAGACCCGTGGAAGGCTTTTATGGAACAAGTAGA 200 

CG T AAC G T 

201 AGCATTGATGGATCAGAAAATAGCTGATTATGCAAAAAAT 240 

TCT G T A CGC 

241 AAAGCTCTTGCAGAGTTACAGGGCCTTCAAAATAATGTCG 280 

GTG ACC GC G 

281 AAGATTATGTGAGTGCATTGAGTTCATGGCAAAAAAATCC 320 

G C C TCCAGC G G C 

321 TGTGAGTTCACGAAATCCACATAGCCAGGGGCGGATAAGA 360 

T C CA T C A TA C 

• • • • 

361 GAGCTGTTTTCTCAAGCAGAAAGTCATTTTCGTAATTCAA 400 

T C C TCC C CA A C 

• • * . 

401 TGCCTTCGTTTGCAATTTCTGGATACGAGGTTCTATTTCT 440 

AGC T C C T T C 

• • * m 

441 AACAACATATGCACAAGCTGCCAACACACATTTATTTTTA 480 

CTC T CCGCC 

• • • • 

481 CTAAAAGACGCTCAAATTTATGGAGAAGAATGGGGATACG 520 

T G C G 

• • • t 

521 AAAAAGAAGATATTGCTGAATTTTATAAAAGACAACTAAA 560 

G GC GCCGCT T 

* • • • 

561 ACTTACGCAAGAATATACTGACCATTGTGTCAAATGGTAT 600 

G C C G C C G 

601 AATGTTGGATTAGATAAATTAAGAGGTTCATCTTATGAAT 640 

C TCC GC C CTCCG 

• • • • 

641 CTTGGGTAAACTTTAACCGTTATCGCAGAGAGATGACATT 680 

G C A A CA G C 


FIG. 12 A 
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• ■ • • 

681 AACAGTATTAGATTTAATTGCACTATTTCCATTGTATGAT 720 

GTGCCCTC C C C 

• • • • 

721 GTTCGGCTATACCCAAAAGAAGTTAAAACCGAATTAACAA 760 

GAAC G G TGCTC 

• f * * 

761 GAGACGTTTTAACAGATCCAATTGTCGGAGTCAACAACCT 800 

GC C T C T 

• •It 

801 TAGGGGCTATGGAACAACCTTCTCTAATATAGAAAATTAT 840 

T T AGC C C C 

841 ATTCGAAAACCACATCTATTTGACTATCTGCATAGAATTC 880 

AG C C T C 

I I « I 

881 AATTTCACACGCGGTTCCAACCAGGATATTATGGAAATGA 920 

C AA T C T C 

• ■ • » 

921 CTCTTTCAATTATTGGTCCGGTAATTATGTTTCAACTAGA 960 

C C C C C 

• • • • 

961 CCAAGCATAGGATCAAATGATATAATCACATCTCCATTCT 1000 

T T C C C 

* * * * 

1001 ATGGAAATAAATCCAGTGAACCTGTACAAAATTTAGAATT 1040 

TCG GGCCTG 

• • • • 

1041 TAATGG AGAAAAAGTCTATAGAGCCGTAGC AAATACAAAT 1080 

C C C G C C C 

• • • • 

1081 CTTGCGGTCTGGCCGTCCGCTGTATATTCAGGTGTT ACAA 1120 

CTG A ATC CC 

I • • « 

1121 AAGTGGAATTTAGCCAATATAATGATCAAACAGATGAAGC 1160 

GGTGCGCG 
• • • • 

1161 AAGTAC ACAAACGTACGACTCAAAAAGAAATGTTGGCGCG 1200 

CCCGT CCTC A 

• • • * 

1201 GTCAGCTGGGATTCTATCGATCAATTGCCTCCAGAAACAA 1240 

TCT C C 

• ■ * • 

1241 CAGATGAACCTCTAGAAAAGGGATATAGCCATCAACTCAA 1280 

C AT G G CC C T 

• • • • 

1281 TTATGTAATGTGCTTTTTAATGCAGGGTAGTAGAGGAACA 1320 

C G C G A TCC G C 

* • • • 

1321 ATCCCAGTGTTAACTTGGACACATAAAAGTGTAGACTTTT 1360 

T G C C GTCC G C 

* • • • 

1361 TTAACATGATTGATTCGAAAAAAATTACACAACTTCCGTT 1400 

C C AGC G G C T C 

FIG.12 B 
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AGTAAAGGCATATAAGTTACAATCTGGTGCTTCCGTTGTC 
G G A C C C G 

GCAGGTCCTAGGTTTACAGGAGGAGATATCATTCAATGCA 
CACT TC CG 
■ • • • 

CAGAAAATGGAAGTGCGGCAACTATTTACGTTACACCGGA 
GCCCAT C G T 

" * • . 

TGTGTCGTACTCTCAAAAATATCGAGCTAGAATTCATTAT 
T G G CA G AC T C 

• • • * • 

GCTTCTACATCTCAGATAACATTTACACTCAGTTTAGACG 
A CAGC C C C C G T 

GGGCACCATTTAATCAATACTATTTCGATAAAACGATAAA 
A CCCGTCTCGCC 
• • • . 

TAAAGGAGACACATTAACGTATAATTCATTTAATTTAGCA 
C T TC C A C AGC C C G 

* • ■ • 

AGTTTCAGCACACCATTCGAATTATCAGGGAATAACTTAC 

T C C C C TC T 

• • • • 

AAATAGGCGTCACAGGATTAAGTGCTGGAGATAAAGTTTA 
GC CTCCCC C C 

• • « 

TATAGACAAAATTGAATTTATTCCAGTGAAT 1791 
C C G G C C C 
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• * * • 

1 ATG AATAATGTATTGAATAGTGGAAGAACAACTATTT 40 

GAC C C C CTC T C C 

• • • « 

41 GTGATGCGTATAATGTAGTAGCCCATGATCCATTTAGTTT 80 

CCACCCGTC CC 

• * • • 

81 TGAACATAAATCATTAGATACCATCCAAAAAGAATGGATG 120 

C C GAGCC C C T T G G G 

121 GAGTGGAAAAGAACAGATCATAGTTTATATGTAGCTCCTG 160 

A C T T C CTC C C C C A 

• • • • 

161 TAGTCGGAACTGTGTCTAGTTTTTTGCTAAAGAAAGTGGG 200 

GT A CCCCTC GC 

* • • * 

201 GAGTCTTATTGGAAAAAGGATATTGAGTGAATTATGGGGG 240 

CTC C C CTC TCC C C T 

• « • • 

241 ATAATATTTCCTAGTGGTAGTACAAATCTAATGCAAGATA 280 

C C ATC GTCC T C C 

* • • • 

281 TTTTAAGGGAGACAGAACAATTCCTAAATCAAAGACTTAA 320 

CG C GTCCGCTC 

• • • * 

321 TACAGATACCCTTGCTCGTGTAAATGCAGAATTGATAGGG 360 

CT TG AACCTG CT 

361 CTCCAAGCGAATATAAGGGAGTTTAATCAACAAGTAGATA 400 

ACTCT CCG GC 

• • * ■ 

401 ATTTTTTAAACCCTACTCAAAACCCTGTTCCTTTATCAAT 440 

CCGTA GT G CTC 

* * * * 

441 AACTTCTTCGGTTAATACAATGCAGCAATTATTTCTAAAT 480 

C CGCT CCCCC 

• « • • 

481 AGATTACCCCAGTTCCAGATACAAGGATACCAGTTGTTAT 520 

G T T T C C CC 

• ■ • « 

521 TATTACCTTTATTTGCACAGGCAGCCAATATGCATCTTTC 560 

TC T AC C T T C CT G 

• • • * 

561 TTTTATTAGAGATGTTATTCTTAATGCAGATGAATGGGGT 600 

C C AC TCGCCCTC A 

• • • • 

601 ATTTCAGCAGCAACATTACGTACGTATCGAGATTACCTGA 640 

C T C TC TA G A CA C T 

• • t . 

641 GAAATTATACAAGAGATTATTCTAATTATTGTATAAATAC 680 

GCCTCT CCC CCC 

FIG. 13A 
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681 


GTATCAAACTGCGTTTAGAGGGTTAAACACCCGTTTACAC 
T G C C T AC C T TA GC T 


720 


721 GATATGTTAGAATTTAGAACATATATGTTTTTAAATGTAT 760 
C CTGCGCC CCTCG 


761 

801 

841 

881 

921 

961 

1001 

1041 

1081 

1121 

1161 

1201 

1241 

1281 

1321 

1361 


TTGAATATGTATCCATTTGGTCATTGTTTAAATATCAGAG 
G C CAG AGTC C C G C 

* * * * 

TCTTATGGTATCTTCTGGCGCTAATTTATATGCTAGCGGT 
CTG GC AC CCC CTCT C 

AGTGGACCACAGCAGACACAATCATTTACAGCACAAAACT 
A T GAGC C T G 
• • • • 

GGCCATTTTTATATTCTCTTTTCCAAGTTAATTCGAATTA 
C G AGCT G C C C C 

• • • ♦ 

TATATTATCTGGTATTAGTGGTACTAGGCTTTCTATTACC 
C TC CAG CTC G C A C C A 

• • • • 

TTCCCTAATATTGGTGGTTTACCGGGTAGTACTACAACTC 
T C C AC T A CTCC C 

ATTCATTGAATAGTGCCAGGGTTAATTATAGCGGAGGAGT 
AGCC T CTC A G C C T T 

• • » • 

TTCATCTGGTCTCATAGGGGCGACTAATCTCAATCACAAC 
CAGC AT G T T A CT G C 

* • * • 

TTTAATTGCAGCACGGTCCTCCCTCCTTTATCAACACCAT 
C TC C T G A C GAGC G 

• • • « 

TTGTTAGAAGTTGGCTGGATTCAGGTACAGATCGAGAGGG 
G GTCC T CAGC T C A 

CGTTGCTACCTCTACGAATTGGCAGACAGAATCCTTTCAA 
A A C A C G C 

• * • • 

ACAACTTTAAGTTTAAGGTGTGGTGCTTTTTCAGCCCGTG 
CCTCCTC A C T A 

• • • • 

GAAATTCAAACTATTTCCCAGATTATTTTATCCGTAATAT 
G CT CCC TA G C 

TTCTGGGGTTCCTTTAGTTATTAGAAACGAAGATCTAACA 
C T CCCCGT CCC 

AGACCGTTACACTATAACCAAATAAGAAATATAGAAAGTC 
CTACTTC GTGCC GTC 

CTTCGGGAACACCTGGTGGAGCACGGGCCTATTTGGTATC 
ACTTAAT AATCCCG 
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• • • • 
1401 TGTGC ATAACAGAAAAAATAATATCTATGCCGCTAATGAA 
C GGCC CTCCG 

• • • • 

1441 AATGGTACTATGATCCATTTGGCGCCAGAAGATTATACAG 
CC TCCTA CT 

1481 G ATTTACTATATCGCCAAT ACATGCC ACTC AAGTGAATAA 
CCCT C TC C 

• • • • 

1521 TCAAACTCGAAC ATTTATTTCTGAAAAATTTGGAAATCAA 
GACCCCC GC 

1561 GGTGATTCCTTAAGATTTGAACAAAGCAACACGACAGCTC 
C GGCGTC T C A 

• • ■ . 

1601 GTTATACGCTTAGAGGGAATGGAAATAGTTACAATCTTTA 
GCTTG C CC C 

• ' • • • , 
1641 TTTAAGAGTATCTTC AATAGGAAATTC AACTATTCGAGTT 
C G TAGC CTTCCCCT 

* • • • 

1681 ACTATAAACGGT AGAGTTT ATACTGTTTC AAATGTTAATA 
CC ACT CACT GC 

* 

1721 CCACTAC AAATAACG ATGG AGTTAATG ATAATGGAGCTCG 
TAGCT C CCC CA 
• • • • 

17 61 TTTTTCAGATATTAATATCGGTAATATAGTAGCAAGTGAT 
A CAGC CCCTCCCG CTC C 

1801 AATACTAATGTAACGCTAGATATAAATGTG ACATTAAACT 
C CTTTGCC CCCT 

* * • ■ 

1841 CCGGTACTCCATTTGATCTCATGAATATTATGTTTGTGCC 
T A C C 

• • 

1881 AACTAATCTTCCACCACTTTAT 1902 
C C T T G C 


FIG. 13C 
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1 

• • • * 

ATGGAGGAAAATAATCAAAATCAATGCATACCTTACAATT 

G C C C T A C 

40 

v 41 

• * • • 

GTTTAAGTAATCCTGAAGAAGTACTTTTGGATGGAGAACG 
CG CA G TGCT 

80 

l 

81 

t • * • 

GATATCAACTGGTAATTCATCAATTGATATTTCTCTGTCA 
CT C CTCCCCCT C 

120 

121 

• • • • 

CTTGTTCAGTTTCTGGTATCTAACTTTGTACCAGGGGGAG 

T G C CAGC C G T T 

160 

161 

* • • a 

GATTTTTAGTTGGATTAATAGATTTTGTATGGGGAATAGT 
GCCTC C TCCC TC 

200 

201 

« i * ■ 

TGGCCCTTCTCAATGGGATGCATTTCTAGTACAAATTGAA 
. T A C G G G 

240 

241 

* • • a 

CAATTAATTAATGAAAGAATAGCTGAATTTGCTAGGAATG 
GGCCGGC GCC C 

280 

281 

CTGCTATTGCTAATTTAGAAGGATTAGGAAACAATTTCAA 
CC CG GCTC 

320 

321 

TATATATGTGGAAGCATTTAAAGAATGGGAAGAAGATCCT 
CC GCC G GC 

360 

361 

• • • • 

AATAATCCAGAAACCAGGACCAGAGTAATTGATCGCTTTC 

C G CCTGGCCAACA 

400 

401 

• • a . 

GTATACTTGATGGGCTACTTGAAAGGGACATTCCTTCGTT 

ACTGCCCTGGATCAC 

440 

441 

* • * * 

TCGAATTTCTGGATTTGAAGTACCCCTTTTATCCGTTTAT 
CA C CC TTCG GC 

480 

481 

* • • ■ 
GCTCAAGCGGCCAATCTGCATCTAGCTATATTAAGAGATT 
AT T C C CC TC CA 

520 

521 

• * a a 

CTGTAATTTTTGGAGAAAGATGGGGATTGACAACGATAAA 
GCC G G CTC 

560 

561 

• a a a 

TGTCAATGAAAACTATAATAGACTAATTAGGCATATTGAT 

C GTCC TC C C 

600 

601 

* 

• • a a 

GAATATGCTGATCACTGTGCAAATACGTATAATCGGGGAT 
GCCC TCCCCTC 

640 

641 

TAAATAATTTACCGAAATCTACGTATCAAGATTGGATAAC 
GCCCTG T T 

680 

681 

ATATAATCGATTACGGAGAGACTTAACATTGACTGTATTA 

C C CA G GA G CC C A T G 

720 


FIG. 14A 
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721 GATATCGCCGCTTTCTTTCCAAACTATGACAATAGGAGAT 
C T A C G C 

761 ATCCAATTCAGCCAGTTGGTCAACTAACAAGGGAAGTTTA 
CTCA G T C A C 

• * • • 

801 TACGGACCCATTAATTAATTTTAATCCACAGTTACAGTCT 
T CT CCCT G AAG 

• • • * 

841 GTAGCTCAATTACCTACTTTTAACGTTATGGAGAGCAGCC 
CCCTCAC C TC 

• • « • 

881 GAATTAGAAATCCTCATTTATTTGATATATTGAATAATCT 
TCGCACG CC CC 

• • • • 

921 TACAATCTTTACGGATTGGTTTAGTGTTGGACGCAATTTT 
T CC CC GTCC 

* • • . 

961 TATTGGGGAGGACATCGAGTAATATCTAGC CTTATAGGAG 
T CA G C C CTCT T 

4 * • I 

1001 GTGGTAACATAACATCTCCTATATATGGAAGAGAGGCGAA 
G T C C C T A 

1041 CCAGGAGCCTCCAAGATCCTTTACTTTTAATGGACCGGTA 
A C TAGT C C C C T A C 
• • • • 

1081 TTTAGGACTTTATCAAATCCTACTTTACGATTATTACAGC 
CACGTC CGA GCC 
• 

1121 AACCTTGGCCAGCGCCACCATTTAATTTACGTGGTGTTGA 

T T C CC TA A 

1161 AGGAGTAGAATTTTCTACACCTACAAATAGCTTTACGTAT 
G C T G C T C CTC C T C 

1201 CGAGGAAGAGGTACGGTTGATTCTTTAACTGAATTACCGC 

A T AC C G C CCA 

1241 CTGAGGATAATAGTGTGCCACCTCGCGAAGGATATAGTCA 
ACC CA G C CTCC 

1281 TCGTTTATGTCATGCAACTTTTGTTCAAAGATCTGGAACA 
CAGGCC CCGGCTC T 

1321 CCTTTTTTAACAACTGGTGTAGTATTTTCTTGGACCGATC 
ACCCTAATGCA T 

1361 GTAGTGCAACTCTTACAAATACAATTGATCCAGAGAGAAT 
T C T C C G 

FIG.14B 
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1401 TAATCAAATACCTTTAGTGAAAGGATTTAGAGTTTGGGGG 1440 

C CAGCGTCCTG A 

1441 GGCACCTCTGTCATTACAGGACCAGGATTTACAGGAGGGG 1480 

AT C C C T 

• • • • 

1481 ATATCCTTCGAAGAAATACCTTTGGTGATTTTGTATCTCT 1520 

T A C T C C GAGC 

1521 ACAAGTCAATATTAATTCACCAATTACCCAAAGATACCGT 1560 

C TCCCT T T 

• • ■ • 

1561 TTAAGATTTCGTTACGCTTCCAGTAGGGATGCACGAGTTA 1600 

C C G A TTCCC T C TA C 

1601 TAGTATTAACAGGAGCGGCATCCACAGGAGTGGGAGGCCA 1640 

C GC CCCATTCTCTA 
• * • ■ 

1641 AGTTAGTGTAAATATGCCTCTTCAGAAAACTATGGAAATA 1680 

CTCC G C AC G G C 

• • • . 

1681 GGGGAGAACTTAACATCTAGAACATTTAGATATACCGATT 1720 

C G CGCC C C 

• • • ■ 

1721 TTAGTAATCCTTTTTCATTTAGAGCTAATCCAGATATAAT 1760 

CTC C CAGT CC T C C T C C 

• • • • 

1761 TGGGATAAGTGAACAACCTCTATTTGGTGCAGGTTCTATT 1800 

CTC C AT AGC C 

• ■ • ■ 

1801 AGTAGCGGTGAACTTTATATAGATAAAATTGAAATTATTC 1840 

TCATCT C TGCTCG GC 

• ■ • • 

1841 TAGCAGATGCAACATTTGAAGCAGAATCTGATTTAGAAAG 1880 

TCCTCCCGTG ACA CC T G 

• • • . 

1881 AGCACAAAAGGCGGTGAATGCCCTGTTTACTTCTTCCAAT 1920 

C G T C C C CA 

• • « • 

1921 CAAATCGGGTTAAAAACCGATGTGACGGATTATCATATTG 1960 

GCTCG TACTTC C 

1961 ATCAAGTATCCAATTTAGTGGATTGTTTATCAGATGAATT 2000 

C G C G CACC ACC TAGC G 

' • . . 

2001 TTGTCTGGATGAAAAGCGAGAATTGTCCGAGAAAGTCAAA 2040 

CCCCG TCC T 

* • * . 

2041 CATGCGAAGCGACTCAGTGATGAGCGGAATTTACTTCAAG 2080 

CC T CCA CCTG 

2081 ATCCAAACTTCAG AGGGATCAATAGACAAC CAGACC GTGG 2120 

CT C A AC C G G A 

FIG. 14C 
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2121 

CTGGAGAGGAAGTACAGATATTACCATCCAAGGAGGAGAT 
TGT CCGGC CC 

2160 

V 

2161 

• • • • 

GACGTATTCAAAGAGAATTACGTCACACTACCGGGTACCG 
TG G C CCTCATT 

2200 

i 

2201 

• « • • 

TTGATGAGTGCTATCCAACGTATTTATATCAGAAAATAGA 
CC CTCCGC GC 

2240 


2241 

• • * « 

TGAGTCGAAATTAAAAGCTTATACCCGTTATGAATTAAGA 

C CC CTC AG CCT 

2280 


2281 

• « • • 

GGGTATATCGAAGATAGTCAAGACTTAGAAATCTATTTGA 
CC CC CT CC 

2320 


2321 

TCCGTTACAATGCAAAACACGAAATAGTAAATGTGCCAGG 
AG CG G CC G C 

2360 


2361 

■ • • a 

CACGGGTTCCTTATGGCCGCTTTCAGCCCAAATGCCAATC 

T T C C A T TCT C T 

2400 


2401 

GGAAAGTGTGGAGAACCGAATCGATGCGCGCCACACCTTG 

G G T CA T 

2440 


2441 

• a a , 

AATGGAATCCTGATCTAGATTGTTCCTGCAGAGACGGGGA 

G CTGCC GTC 

2480 


2481 

• • • • 
AAAATGTGCACATCATTCCCATCATTTCACCTTGGATATT 
GG CC T CT CC 

2520 


2521 

“ • • • 
GATGTTGGATGTACAGACTTAAATGAGGACTTAGGTGTAT 

G TCG CCAC 

2560 


2561 

* * • a 

GGGTGATATTCAAGATTAAGACGCAAGATGGCCATGCAAG 

C C C C C A C 

2600 


2601 

* • • a 

ACTAGGGAATCTAGAGTTTCTCGAAGAGAAACCATTATTA 

T C C T GG C 

2640 


2641 

* • • a 

GGGGAAGCACTAGCTCGTGTGAAAAGAGCGGAGAAGAAGT 

T T C G A 

2680 

* 

2681 

* • • a 

GGAGAGACAAACGAGAGAAACTGCAGTTGGAAACAAATAT 

G T CG A G T C 

2720 

* 

2721 

♦ • • . 

TGTTTATAAAGAGGCAAAAGAATCTGTAGATGCTTTATTT 

C CG C GCG GC 

2760 


2761 

GTAAACTCTCAATATGATAGATTACAAGTGGATACGAACA 

G C CAG G CC C C 

2800 


2801 

TCGCCATGATTCATGCGGCAGATAAACGCGTTCATAGAAT 
CCC C TGCC 

2840 



FIG. 14D 
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2841 

CCGGGAAGCGTATCTGCCAGAGTTGTCTGTGATTCCAGGT 
TTGTCT T C CT 

2880 

2881 

• ■ • • 

GTCAATGCGGCCATTTTCGAAGAATTAGAGGGACGTATTT 
GCT C GCT C 

2920 

2921 

TTACAGCGTATTCCTTATATGATGCGAGAAATGTCATTAA 
CATC GC C C C 

2960 

2961 

AAATGGCGATTTCAATAATGGCTTATTATGCTGGAACGTG 

G C T C C C CAGC T 

3000 

3001 

AAAGGTCATGTAGATGTAGAAGAGCAAAACAACCACCGTT 
GCGGAG TG 

3040 

3041 

CGGTCCTTGTTATCCCAGAATGGGAGGCAGAAGTGTCACA 

C GGGTG AT C 

3080 

3081 

AGAGGTTCGTGTCTGTCCAGGTCGTGGCTATATCCTTCGT 

A A A A C T C 

3120 

3121 

• • • • 

GTCACAGCATATAAAGAGGGATATGGAGAGGGCTGCGTAA 
GCTCG CT T G 

3160 

3161 

• • • I 

CGATCCATGAGATCGAAGACAATACAGACGAACTGAAATT 

C C GACC GTG 

3200 

3201 

• • • • 

CAGCAACTGTGTAGAAGAGGAAGTATATCCAAACAACACA 
TC CCGAAC C C 

3240 

3241 

• • • • 
GTAACGTGTAATAATTATACTGGGACTCAAGAAGAATATG 
TTCCGCC TA G GC 

3280 

3281 

* * • • 
AGGGTACGTACACTTCTCGTAATCAAGGATATGACGAAGC 
GA G C AGC CAG T CA 

3320 

3321 

« • 1 | 

CTATGGTAATAACCCTTCCGTACCAGCTGATTACGCTTCA 
TCC TCXXXXXXXXXXXX T T C T C C 

3360 

3361 

* * * • 
GTCTATGAAGAAAAATCGTATACAGATGGACGAAGAGAGA 
GCGG CC CACT 

3400 

3401 

• • • • 
ATCCTTGTGAATCTAACAGAGGCTATGGGGATTACACACC 

C C G TC T CA C 

3440 

3441 

ACTACCGGCTGGTTATGTAACAAAGGATTTAGAGTACTTC 
TAT C TC GCT T 

3480 

3481 

CCAGAGACCGATAAGGTATGGATTGAGATCGGAGAAACAG 

T CAGC T C 

3520 

3521 

AAGGAACATTCATCGTGGATAGCGTGGAATTACTCCTTAT 

G C C GC T T G 

3560 

3561 

GGAGGAA 3567 PIG. 14E 
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41 

81 
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201 
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281 

321 

361 
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441 

481 
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601 
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AGATCTAGAGGTAATTGTTATGAGTACTGTCGTGGTTAAG 

GATC 

• • • * 
GGAAACGTCAACGGTGGTGTACAACAACCTAGAAGGAGGA 


GAAGGCAATCCCTTCGCAGGAGGGCTAACAGAGTACAGCC 
T A T 

• « • • 

AGTGGTTATGGTCACTGCTCCTGGCGAACCCAGGAGGAGG 

GC A A A 

AGACGCAGAAGAGGAGGCAATCGCAGGTCAAGAAGAACTG 
AG T A 

• • • * 

GAGTTCCCAGGGGAAGGGGCTCAAGCGAGACATTCGTGTT 
A AT 

* • • • 

TACAAAGGACAACCTCGTGGGCAACTCCCAAGGAAGTTTC 


ACCTTCGGACCAAGTGTATCAGACTGTCCAGCATTCAAGG 

T 

• * • ■ 
ATGGAATACTCAAGGCCTACCATGAGTACAAGATCACAAG 
T 

■ * • • 

TATCCTTCTTCAGTTCGTCAGCGAGGCCTCTTCCACCTCA 
T G T 

CCAGGATCCATCGCTTATGAGTTGGACCCACATTGCAAAG 
C AT 

TATCATCCCTCCAGTCCTACGTCAACAAGTTCCAAATCAC 

T 

AAAGGGAGGAGCTAAGACCTATCAAGCTAGGATGATCAAC 
T T C T 

• * • i 

GGAGTAGAATGGCACGATTCATCTGAGGATCAGTGCAGGA 


TACTTTGGAAAGGAAGTGGAAAATCTTCAGACCCAGCAGG 


ATCTTTCAGAGTCACCATCAGAGTGGCTCTTCAAAACCCC 


AAGTAATAGACTCCGGATCAGAGCCTGGTCCAAGCCCACA 
A T 


FIG. 16A 
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681 ACCAACACCCACTCCAACTCCCCAAAAGCATGAGCGATTT 
721 ATTGCTTACGTCGGCATACCTATGCTGACCATTCAAGAAT 

761 TC 762 


FIG. 16B 
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